Test suite errors on arm64 architecture

tillea commented 1 year ago

Hi,

the Debian packaged version of canu seems to work nicely on several 64bit architectures. Unfortunately it fails the CI test we wrote for canu which calls the command

canu -p ecoli -d ecoli-pacbio genomeSize=4.8m maxThreads=4 -pacbio /tmp/autopkgtest-lxc.6mhavcm6/downtmp/autopkgtest_tmp/pacbio.fastq

failing on arm64 architecture. The Debian infrastructure provides full logs which include the installation of all preconditions for the software that is used. So please inspect the full log of the arm64 test (and scroll down to the end) to see the whole test result. If you want to compare the issue with other architectures you can check our tracker which provides links in green color named "PASS". As you can read on this page the canu version is 2.2.

Kind regards, Andreas,

skoren commented 1 year ago

Canu doesn't support arm architectures, I'm surprised it compiled there at all. You can see that there have been some recent changes (#2260) to support arm but I wouldn't expect v2.2 to work on it.

mr-c commented 1 year ago

@skoren We've been compiling for arm64 version 2.0 ; only recently we began running some extra testing and ran into the failure linked about.

Interestingly enough, the assembly succeeds on ppc64el, riscv64, and s390x! https://ci.debian.net/packages/c/canu/

skoren commented 1 year ago

A more detailed error should be in correction/0-mercounts/meryl-count.000001.out. Are you able to capture the nested out files from your logs on failure?

mr-c commented 1 year ago

@skoren I've queued up a new build that grabs all *.out files and I'm trying another build on a Debian arm64 porterbox

mr-c commented 1 year ago

On a dedicated machine I wasn't able to reproduce the error; but on a new CI run I got the following: artifacts(2).tar.gz

https://ci.debian.net/data/autopkgtest/testing/arm64/c/canu/39150058/log.gz

skoren commented 1 year ago

Nothing very enlightening in the logs:

FINAL CONFIGURATION
-------------------

Estimated to require 311 MB memory out of 2048 MB allowed.
Estimated to require 2 batches.

Configured complex mode for 0.304 GB memory per batch, and up to 2 batches.

Start counting with THREADED method.
Used 0.268 GB / 1.969 GB to store      2093487 kmers; need 0.002 GB to sort        36114 kmers

Failed with 'Segmentation fault'; backtrace (libbacktrace):
(null)::0 in (null)()
(null)::0 in (null)()
(null)::0 in (null)()
(null)::0 in (null)()
(null)::0 in (null)()
(null)::0 in (null)()
(null)::0 in (null)()
(null)::0 in (null)()
Segmentation fault

but if it's working on a dedicated machine, is it possible that the CI is restricting memory so when it tries to allocate some it fails? Not sure if @brianwalenz has any other suggestions.

brianwalenz commented 1 year ago

Sadly, no suggestions.

Both HEAD and v2.2 run fine with the parameters expected to be used in the CI (meryl k=16 threads=4 memory=2 count segment=1/01 ../../ecoli.seqStore output out.meryl) on both amd64 and arm64 (via M2pro), amd64 ran cleanly in valgrind.

tillea commented 11 months ago

memo

I've asked the admins of the CI infrastructure and the answer is: The memory for the arm64 workers isn't huge (8GB), but all i386 and some of the amd64 workers have the same. Kind regards, Andreas.

paulgevers commented 11 months ago

Hmm, I wonder about that 10 GB memory needed in the text below. That's more than available

root@elbrus:/tmp/autopkgtest-lxc.3wrkiidv/downtmp/build.ej9/src# /usr/lib/canu/bin/meryl k=16 threads=4 memory=2 count segment=1/01 ../../autopkgtest_tmp/ecoli-pacbio/ecoli.seqStore output out.meryl

Found 1 command tree.

Counting 110 (estimated) million canonical 16-mers from 1 input file:
    canu-seqStore: ../../autopkgtest_tmp/ecoli-pacbio/ecoli.seqStore

SIMPLE MODE
-----------

  16-mers
    -> 4294967296 entries for counts up to 65535.
    -> 64 Gbits memory used

  115899341 input bases
    -> expected max count of 463597, needing 4 extra bits.
    -> 16 Gbits memory used

  10 GB memory needed

COMPLEX MODE
------------

prefix     # of   struct   kmers/    segs/      min     data    total
  bits   prefix   memory   prefix   prefix   memory   memory   memory
------  -------  -------  -------  -------  -------  -------  -------
     1     2  P   434 kB    27 MM    26 kS  8192  B   214 MB   214 MB
     2     4  P   427 kB    13 MM    12 kS    16 kB   207 MB   207 MB
     3     8  P   426 kB  7073 kM  6417  S    32 kB   200 MB   200 MB
     4    16  P   437 kB  3536 kM  3096  S    64 kB   193 MB   193 MB
     5    32  P   474 kB  1768 kM  1493  S   128 kB   186 MB   187 MB
     6    64  P   561 kB   884 kM   719  S   256 kB   179 MB   180 MB
     7   128  P   750 kB   442 kM   346  S   512 kB   173 MB   173 MB
     8   256  P  1140 kB   221 kM   166  S  1024 kB   166 MB   167 MB
     9   512  P  1936 kB   110 kM    80  S  2048 kB   160 MB   161 MB
    10  1024  P  3544 kB    55 kM    39  S  4096 kB   156 MB   159 MB  Best Value!
    11  2048  P  6768 kB    27 kM    19  S  8192 kB   152 MB   158 MB
    12  4096  P    12 MB    13 kM     9  S    16 MB   144 MB   156 MB
    13  8192  P    25 MB  7074  M     5  S    32 MB   160 MB   185 MB
    14    16 kP    50 MB  3537  M     2  S    64 MB   128 MB   178 MB
    15    32 kP   101 MB  1769  M     1  S   128 MB   128 MB   229 MB
    16    64 kP   202 MB   885  M     1  S   256 MB   256 MB   458 MB
    17   128 kP   405 MB   443  M     1  S   512 MB   512 MB   917 MB
    18   256 kP   810 MB   222  M     1  S  1024 MB  1024 MB  1834 MB
    19   512 kP  1620 MB   111  M     1  S  2048 MB  2048 MB  3668 MB
    20  1024 kP  3240 MB    56  M     1  S  4096 MB  4096 MB  7336 MB

FINAL CONFIGURATION
-------------------

Estimated to require 311 MB memory out of 2048 MB allowed.
Estimated to require 2 batches.

Configured complex mode for 0.304 GB memory per batch, and up to 2 batches.

Start counting with THREADED method.
Used 0.269 GB / 1.969 GB to store      2093487 kmers; need 0.002 GB to sort        39856 kmers

Failed with 'Segmentation fault'; backtrace (libbacktrace):
(null)::0 in (null)()
(null)::0 in (null)()
(null)::0 in (null)()
(null)::0 in (null)()
(null)::0 in (null)()
(null)::0 in (null)()
(null)::0 in (null)()
(null)::0 in (null)()
Segmentation fault

If you tell me what command to run, I guess I can run it in the container.

paulgevers commented 11 months ago

(gdb) bt
#0  merylCountArray::add(unsigned __int128) (this=0xffffe76e2d98, suffix=<optimized out>) at meryl/src/meryl/merylCountArray.C:579
#1  0x0000aaaaaaaadbd8 in insertKmers (G=0xaaaaaab20690, T=<optimized out>, S=0xffffe0003ba0) at meryl/src/meryl/merylOp-countThreads.C:274
#2  0x0000aaaaaaac3c3c in sweatShop::worker (this=0xaaaaaab28430, workerData=0xaaaaaab28520) at utility/src/utility/sweatShop.C:305
#3  0x0000fffff7b91318 in start_thread (arg=0x0) at ./nptl/pthread_create.c:444
#4  0x0000fffff7bfb01c in thread_start () at ../sysdeps/unix/sysv/linux/aarch64/clone.S:79

mr-c commented 11 months ago

@paulgevers https://salsa.debian.org/med-team/canu/-/blob/master/debian/tests/run-unit-test

paulgevers commented 11 months ago

@mr-c as the maintainer of autopkgtest you can be assured I already read that file. Unfortunately that doesn't tell me what's of interest to run manually now.

brianwalenz commented 11 months ago

Oh, awesome, thanks for the stack trace! Do you happen to know the memory page size being used here?

Is there is a downoadable VM I can use to reproduce this? Possibly just one with an installed OS would be sufficient...and (brief) instructions on how to run would be invaluable as I'm not terribly VM literate.

About the "10 GB needed": It is reporting estimated resources required for two algorithms, the 'simple' and the 'complex'. The simple claims to need 10 GB, and so isn't used, while the complex is estimating 0.3 GB.

paulgevers commented 11 months ago

Oh, awesome, thanks for the stack trace! Do you happen to know the memory page size being used here?

No, but if you teach me how to look it up, I can do that.

Is there is a downoadable VM I can use to reproduce this? Possibly just one with an installed OS would be sufficient...and (brief) instructions on how to run would be invaluable as I'm not terribly VM literate.

Neither am I. The Debian CI infrastructure doesn't work with VMs but with lxc containers that are generated with autopkgtest.

brianwalenz commented 11 months ago

pagesize or getconf PAGESIZE or getconf PAGE_SIZE.

I'll try to reproduce the failure here this week, failing that, I'll add a bunch of debugging to a branch and let you run another test.

paulgevers commented 11 months ago

root@ci-worker-arm64-02:~# getconf PAGESIZE
4096

brianwalenz commented 11 months ago

Shucks, that theory went out the window.

tillea commented 11 months ago

Do we have some alternative idea to track down the problem?

brianwalenz commented 10 months ago

Thanks for the ping. After a ferocious battle with QEMU I have reproduced the crash, literally just now.

Annoyingly, it does NOT crash when I build Canu with debugging support, nor it fail when run under valgrind.

tillea commented 10 months ago

Am Wed, Nov 29, 2023 at 04:49:15AM -0800 schrieb Brian Walenz:

Thanks for the ping. After a ferocious battle with QEMU I have reproduced the crash, literally just now.

Thanks a lot for it.

Annoyingly, it does NOT crash when I build Canu with debugging support, nor it fail when run under valgrind.

Argh. Crossing fingers that its not too cumbersome, Andreas.

brianwalenz commented 10 months ago

Hopefully fixed. 'twas a good bug.

I had used too weak of a memory ordering requirement (https://en.cppreference.com/w/cpp/atomic/memory_order) that allowed aarch64 to reorder instructions such that a shared memory allocation was able to escape a critical section. The key quote from the linked page is

On strongly-ordered systems — x86, SPARC TSO, IBM mainframe, etc. — release-acquire ordering is automatic for the majority of operations. [...] On weakly-ordered systems (ARM, Itanium, PowerPC), special CPU load or memory fence instructions are used.

What I thought was implementing a critical section worked on amd64 more-or-less by default; on ARM the default wasn't strong enough.

tillea commented 10 months ago

Am Wed, Nov 29, 2023 at 12:56:10PM -0800 schrieb Brian Walenz:

Hopefully fixed. 'twas a good bug. ... Thanks a lot for this tough work. Do you intend to issue a micro release with this fix? Otherwise we might use the related commit as a patch but a release would be more convenient and also helpful for other downstream users. Kind regards, Andreas.

tillea commented 9 months ago

Ping about a new release or a commit we can cherry-pick for the Debian package.

brianwalenz commented 8 months ago

Hi Andreas- I'm (finally) getting around to making a release. There have been a few build changes (the handling of externally defied CXXFLAGS), and packaging changes (installing perl modules into lib/perl5/site_perl/canu instead of lib/site_perl/canu). Is there anything on your side you'd like to see? I didn't find any Debian-specific patches in https://salsa.debian.org/med-team/canu but it is entirely likely I looked in the wrong place.

Apologies for the double-work of cherry-picking and then updating for a new release. The release kept getting preempted by other projects.

tillea commented 8 months ago

Hi Brian, its great if you could do a release and we'll see what patches might potentially adapted. For the moment I do not have any comment except to say thank you for your work Andreas.

mr-c commented 8 months ago

I didn't find any Debian-specific patches in https://salsa.debian.org/med-team/canu but it is entirely likely I looked in the wrong place.

Check out https://salsa.debian.org/med-team/canu/-/tree/master/debian/patches?ref_type=heads but I think all are merged upstream already.

Did you update your copy of parasail to grab https://github.com/jeffdaily/parasail/pull/102 ?

mr-c commented 7 months ago

Did you update your copy of parasail to grab jeffdaily/parasail#102 ?

Answering my own question: Yes, as of https://github.com/marbl/canu/commit/a55ecfaa3fa1d39ba7d0a577dcc5e4cd6c413ca0

@brianwalenz Looks like all is ready for a new release!

marbl / canu

Test suite errors on arm64 architecture #2271