Open tillea opened 1 year ago
Canu doesn't support arm architectures, I'm surprised it compiled there at all. You can see that there have been some recent changes (#2260) to support arm but I wouldn't expect v2.2 to work on it.
@skoren We've been compiling for arm64 version 2.0 ; only recently we began running some extra testing and ran into the failure linked about.
Interestingly enough, the assembly succeeds on ppc64el
, riscv64
, and s390x
! https://ci.debian.net/packages/c/canu/
A more detailed error should be in correction/0-mercounts/meryl-count.000001.out
. Are you able to capture the nested out files from your logs on failure?
@skoren I've queued up a new build that grabs all *.out
files and I'm trying another build on a Debian arm64 porterbox
On a dedicated machine I wasn't able to reproduce the error; but on a new CI run I got the following: artifacts(2).tar.gz
https://ci.debian.net/data/autopkgtest/testing/arm64/c/canu/39150058/log.gz
Nothing very enlightening in the logs:
FINAL CONFIGURATION
-------------------
Estimated to require 311 MB memory out of 2048 MB allowed.
Estimated to require 2 batches.
Configured complex mode for 0.304 GB memory per batch, and up to 2 batches.
Start counting with THREADED method.
Used 0.268 GB / 1.969 GB to store 2093487 kmers; need 0.002 GB to sort 36114 kmers
Failed with 'Segmentation fault'; backtrace (libbacktrace):
(null)::0 in (null)()
(null)::0 in (null)()
(null)::0 in (null)()
(null)::0 in (null)()
(null)::0 in (null)()
(null)::0 in (null)()
(null)::0 in (null)()
(null)::0 in (null)()
Segmentation fault
but if it's working on a dedicated machine, is it possible that the CI is restricting memory so when it tries to allocate some it fails? Not sure if @brianwalenz has any other suggestions.
Sadly, no suggestions.
Both HEAD and v2.2 run fine with the parameters expected to be used in the CI (meryl k=16 threads=4 memory=2 count segment=1/01 ../../ecoli.seqStore output out.meryl
) on both amd64 and arm64 (via M2pro), amd64 ran cleanly in valgrind.
memo
I've asked the admins of the CI infrastructure and the answer is: The memory for the arm64 workers isn't huge (8GB), but all i386 and some of the amd64 workers have the same. Kind regards, Andreas.
Hmm, I wonder about that 10 GB memory needed
in the text below. That's more than available
root@elbrus:/tmp/autopkgtest-lxc.3wrkiidv/downtmp/build.ej9/src# /usr/lib/canu/bin/meryl k=16 threads=4 memory=2 count segment=1/01 ../../autopkgtest_tmp/ecoli-pacbio/ecoli.seqStore output out.meryl
Found 1 command tree.
Counting 110 (estimated) million canonical 16-mers from 1 input file:
canu-seqStore: ../../autopkgtest_tmp/ecoli-pacbio/ecoli.seqStore
SIMPLE MODE
-----------
16-mers
-> 4294967296 entries for counts up to 65535.
-> 64 Gbits memory used
115899341 input bases
-> expected max count of 463597, needing 4 extra bits.
-> 16 Gbits memory used
10 GB memory needed
COMPLEX MODE
------------
prefix # of struct kmers/ segs/ min data total
bits prefix memory prefix prefix memory memory memory
------ ------- ------- ------- ------- ------- ------- -------
1 2 P 434 kB 27 MM 26 kS 8192 B 214 MB 214 MB
2 4 P 427 kB 13 MM 12 kS 16 kB 207 MB 207 MB
3 8 P 426 kB 7073 kM 6417 S 32 kB 200 MB 200 MB
4 16 P 437 kB 3536 kM 3096 S 64 kB 193 MB 193 MB
5 32 P 474 kB 1768 kM 1493 S 128 kB 186 MB 187 MB
6 64 P 561 kB 884 kM 719 S 256 kB 179 MB 180 MB
7 128 P 750 kB 442 kM 346 S 512 kB 173 MB 173 MB
8 256 P 1140 kB 221 kM 166 S 1024 kB 166 MB 167 MB
9 512 P 1936 kB 110 kM 80 S 2048 kB 160 MB 161 MB
10 1024 P 3544 kB 55 kM 39 S 4096 kB 156 MB 159 MB Best Value!
11 2048 P 6768 kB 27 kM 19 S 8192 kB 152 MB 158 MB
12 4096 P 12 MB 13 kM 9 S 16 MB 144 MB 156 MB
13 8192 P 25 MB 7074 M 5 S 32 MB 160 MB 185 MB
14 16 kP 50 MB 3537 M 2 S 64 MB 128 MB 178 MB
15 32 kP 101 MB 1769 M 1 S 128 MB 128 MB 229 MB
16 64 kP 202 MB 885 M 1 S 256 MB 256 MB 458 MB
17 128 kP 405 MB 443 M 1 S 512 MB 512 MB 917 MB
18 256 kP 810 MB 222 M 1 S 1024 MB 1024 MB 1834 MB
19 512 kP 1620 MB 111 M 1 S 2048 MB 2048 MB 3668 MB
20 1024 kP 3240 MB 56 M 1 S 4096 MB 4096 MB 7336 MB
FINAL CONFIGURATION
-------------------
Estimated to require 311 MB memory out of 2048 MB allowed.
Estimated to require 2 batches.
Configured complex mode for 0.304 GB memory per batch, and up to 2 batches.
Start counting with THREADED method.
Used 0.269 GB / 1.969 GB to store 2093487 kmers; need 0.002 GB to sort 39856 kmers
Failed with 'Segmentation fault'; backtrace (libbacktrace):
(null)::0 in (null)()
(null)::0 in (null)()
(null)::0 in (null)()
(null)::0 in (null)()
(null)::0 in (null)()
(null)::0 in (null)()
(null)::0 in (null)()
(null)::0 in (null)()
Segmentation fault
If you tell me what command to run, I guess I can run it in the container.
(gdb) bt
#0 merylCountArray::add(unsigned __int128) (this=0xffffe76e2d98, suffix=<optimized out>) at meryl/src/meryl/merylCountArray.C:579
#1 0x0000aaaaaaaadbd8 in insertKmers (G=0xaaaaaab20690, T=<optimized out>, S=0xffffe0003ba0) at meryl/src/meryl/merylOp-countThreads.C:274
#2 0x0000aaaaaaac3c3c in sweatShop::worker (this=0xaaaaaab28430, workerData=0xaaaaaab28520) at utility/src/utility/sweatShop.C:305
#3 0x0000fffff7b91318 in start_thread (arg=0x0) at ./nptl/pthread_create.c:444
#4 0x0000fffff7bfb01c in thread_start () at ../sysdeps/unix/sysv/linux/aarch64/clone.S:79
@mr-c as the maintainer of autopkgtest
you can be assured I already read that file. Unfortunately that doesn't tell me what's of interest to run manually now.
Oh, awesome, thanks for the stack trace! Do you happen to know the memory page size being used here?
Is there is a downoadable VM I can use to reproduce this? Possibly just one with an installed OS would be sufficient...and (brief) instructions on how to run would be invaluable as I'm not terribly VM literate.
About the "10 GB needed": It is reporting estimated resources required for two algorithms, the 'simple' and the 'complex'. The simple claims to need 10 GB, and so isn't used, while the complex is estimating 0.3 GB.
Oh, awesome, thanks for the stack trace! Do you happen to know the memory page size being used here?
No, but if you teach me how to look it up, I can do that.
Is there is a downoadable VM I can use to reproduce this? Possibly just one with an installed OS would be sufficient...and (brief) instructions on how to run would be invaluable as I'm not terribly VM literate.
Neither am I. The Debian CI infrastructure doesn't work with VMs but with lxc containers that are generated with autopkgtest
.
pagesize
or getconf PAGESIZE
or getconf PAGE_SIZE
.
I'll try to reproduce the failure here this week, failing that, I'll add a bunch of debugging to a branch and let you run another test.
root@ci-worker-arm64-02:~# getconf PAGESIZE
4096
Shucks, that theory went out the window.
Do we have some alternative idea to track down the problem?
Thanks for the ping. After a ferocious battle with QEMU I have reproduced the crash, literally just now.
Annoyingly, it does NOT crash when I build Canu with debugging support, nor it fail when run under valgrind
.
Am Wed, Nov 29, 2023 at 04:49:15AM -0800 schrieb Brian Walenz:
Thanks for the ping. After a ferocious battle with QEMU I have reproduced the crash, literally just now.
Thanks a lot for it.
Annoyingly, it does NOT crash when I build Canu with debugging support, nor it fail when run under
valgrind
.
Argh. Crossing fingers that its not too cumbersome, Andreas.
Hopefully fixed. 'twas a good bug.
I had used too weak of a memory ordering requirement (https://en.cppreference.com/w/cpp/atomic/memory_order) that allowed aarch64 to reorder instructions such that a shared memory allocation was able to escape a critical section. The key quote from the linked page is
On strongly-ordered systems — x86, SPARC TSO, IBM mainframe, etc. — release-acquire ordering is automatic for the majority of operations. [...] On weakly-ordered systems (ARM, Itanium, PowerPC), special CPU load or memory fence instructions are used.
What I thought was implementing a critical section worked on amd64 more-or-less by default; on ARM the default wasn't strong enough.
Am Wed, Nov 29, 2023 at 12:56:10PM -0800 schrieb Brian Walenz:
Hopefully fixed. 'twas a good bug. ... Thanks a lot for this tough work. Do you intend to issue a micro release with this fix? Otherwise we might use the related commit as a patch but a release would be more convenient and also helpful for other downstream users. Kind regards, Andreas.
Ping about a new release or a commit we can cherry-pick for the Debian package.
Hi Andreas- I'm (finally) getting around to making a release. There have been a few build changes (the handling of externally defied CXXFLAGS), and packaging changes (installing perl modules into lib/perl5/site_perl/canu
instead of lib/site_perl/canu
). Is there anything on your side you'd like to see? I didn't find any Debian-specific patches in https://salsa.debian.org/med-team/canu but it is entirely likely I looked in the wrong place.
Apologies for the double-work of cherry-picking and then updating for a new release. The release kept getting preempted by other projects.
Hi Brian, its great if you could do a release and we'll see what patches might potentially adapted. For the moment I do not have any comment except to say thank you for your work Andreas.
I didn't find any Debian-specific patches in https://salsa.debian.org/med-team/canu but it is entirely likely I looked in the wrong place.
Check out https://salsa.debian.org/med-team/canu/-/tree/master/debian/patches?ref_type=heads but I think all are merged upstream already.
Did you update your copy of parasail to grab https://github.com/jeffdaily/parasail/pull/102 ?
Did you update your copy of parasail to grab jeffdaily/parasail#102 ?
Answering my own question: Yes, as of https://github.com/marbl/canu/commit/a55ecfaa3fa1d39ba7d0a577dcc5e4cd6c413ca0
@brianwalenz Looks like all is ready for a new release!
Hi,
the Debian packaged version of canu seems to work nicely on several 64bit architectures. Unfortunately it fails the CI test we wrote for canu which calls the command
failing on arm64 architecture. The Debian infrastructure provides full logs which include the installation of all preconditions for the software that is used. So please inspect the full log of the arm64 test (and scroll down to the end) to see the whole test result. If you want to compare the issue with other architectures you can check our tracker which provides links in green color named "PASS". As you can read on this page the canu version is 2.2.
Kind regards, Andreas,