marbl / meryl

A genomic k-mer counter (and sequence utility) with nice features.
119 stars 14 forks source link

Seg fault in union-sum #16

Closed tbrown91 closed 4 years ago

tbrown91 commented 4 years ago

Hi, I'm having trouble setting up my meryl dbs for a set of 10x data. I ran _submit_build_10x.sh from merqury and am getting a segmentation fault in the union-sum step. I am only using k=5, but am getting crazy number coming up for suffixsize. Log file is below:

asm_bApuApu_10x.union_sum.27859520.log

Here is an example of one of the 8 count log files:

asm_bApuApu_10x.count.27859519_1.log

I am hoping that a k of 5 is not an issue, but am hoping you'll be able to point me to some parameters in the _submit_build.sh file that should be changed.

Meryl release: v1.0

brianwalenz commented 4 years ago

k=5 is likely the problem. k=6 might work, k=7 should work.

Meryl hasn't seen much (if any) testing at the small k sizes (k < 16). There's an implementation detail that is probably assuming k > 6. The large suffixSize you're seeing is indicating some value went negative.

tbrown91 commented 4 years ago

Thank you, I will try it. I took k=5 from the best_k.sh output, which gave 5.05... how should this result be interpretted, or is there a better way to choose k? Working currently with a ~1G genome

skoren commented 4 years ago

A k of 5 is way too low for a 1gb genome, it would be too low for even a bacterial genome. If the best_k.sh is outputting that low a value, then it is bug in mercury, I'd suggest opening an issue there. For a 1gb genome I'd use at least a 21mer.