marbl / merqury

k-mer based assembly evaluation
Other
282 stars 19 forks source link

Segmentation fault in meryl-lookup #3

Closed egoltsman closed 4 years ago

egoltsman commented 4 years ago

Hi, Thank you for a very useful tool! I'm currently trying on a trio data set where we had CCS reads for the F1 and Illumina for the progenitors. I assembled it with HiCanu (no trio data involved) and got a very good assembly. Now I'm running merqury with the parent hap-mers to help evaluate the ploidy and the phasing.
I'm getting a consistent crash in meryl-lookiup when executed via the phase_block.sh wrapper. It seems like the default stdout output causes it to seg fault. I could work around it by adding -output to the command and then passing that file to the awk (see below).

meryl-lookup -dump -memory 4 -sequence $scaff.fasta -mers $hap.meryl  |  awk -v hap=$hap_short -v k=$k '$(NF-4)=="T" {print $1"\t"$(NF-5)"\t"($(NF-5)+k)"\t"hap}' > $out.$hap.bed

-- Loading kmers from 'sw_orange.hapmer.meryl' into lookup table.

 p       prefixes             bits gigabytes (allowed: 4 GB)
-- -------------- ---------------- ---------
15          32768       1131534860     0.132
16          65536       1106740638     0.129
17         131072       1084043568     0.126
18         262144       1065540802     0.124
19         524288       1055426644     0.123 (smallest)
20        1048576       1062089702     0.124
21        2097152       1102307192     0.128
22        4194304       1209633546     0.141
23        8388608       1451177628     0.169
24       16777216       1961157166     0.228
25       33554432       3008007616     0.350
26       67108864       5128599890     0.597
27      134217728       9396675812     1.094
28      268435456      17959719030     2.091 (used)
29      536870912      35112696840     4.088
30     1073741824      69445543834     8.085
31     2147483648     138138129196    16.081
32     4294967296     275550191294    32.078
-- -------------- ---------------- ---------

For 26891374 distinct 20-mers (with 28 bits used for indexing and 12 bits for tags):
    2.091 GB memory
    2.000 GB memory for index (268435456 elements 64 bits wide)
    0.038 GB memory for tags  (26891374 elements 12 bits wide)
    0.053 GB memory for data  (26891374 elements 17 bits wide)

Will load 26891374 kmers.  Skipping 0 (too low) and 0 (too high) kmers.
Allocating space for 26891374 suffixes of 12 bits each -> 322696488 bits (0.038 GB) in blocks of 32.000 MB
                     26891374 values   of 17 bits each -> 457153358 bits (0.053 GB) in blocks of 32.000 MB
Loaded 26891374 kmers.  Skipped 0 (too low) and 0 (too high) kmers.
-- Opening sequences in 'canu.FG5.contigs.fasta'.

Failed with 'Segmentation fault'; backtrace (libbacktrace):
utility/system-stackTrace.C::89 in _Z17AS_UTL_catchCrashiP7siginfoPv()
(null)::0 in (null)()
meryl/meryl-lookup.C::97 in _Z13dumpExistenceP10dnaSeqFileP20compressedFileWriterRSt6vectorIP20kmerCountExactLookupSaIS5_EERS3_IPKcSaISA_EE()
meryl/meryl-lookup.C::426 in main()
(null)::0 in (null)()
(null)::0 in (null)()
Segmentation fault

My workaround:

meryl-lookup -dump -memory 4 -sequence $scaff.fasta -mers $hap.meryl -threads 36 -output dump_tmp                                                                                              
cat dump_tmp | awk -v hap=$hap_short -v k=$k '$(NF-4)=="T" {print $1"\t"$(NF-5)"\t"($(NF-5)+k)"\t"hap}' > $out.$hap.bed  && rm dump_tmp

[completes normally]

As a side note, I also noticed that the -threads flag is not being passed to the program. It sped up the db IO greatly when I added it.

Best, Eugene

arangrhie commented 4 years ago

Hello, @egoltsman !

Thanks for sharing your errors.

It seems like you are running Merqury with the latest Meryl. Perhaps you have it from your installed Canu path. Some grammatical changes are made as you've noticed in meryl-lookup, however we haven't fully optimized the full Meruqury pipeline yet. The threading is also a new feature.

Sorry for the confusion, however I'd suggest to use the release version of the Meryl, as we haven't fully tested the latest meryl-lookup yet. I'll push another update to Merqury with the proper version of Meryl once it is tested.

Thanks! Arang

egoltsman commented 4 years ago

Ah, yes, of course. I just downloaded meryl 1.0 and everything works fine. Thank you for the quick response!