marbl / merqury

k-mer based assembly evaluation
Other
272 stars 19 forks source link

Merqury does not generate results when increasing k-mer size #65

Closed bistace closed 2 years ago

bistace commented 2 years ago

Hello.

I am trying to use Merqury on a very large genome (15Gb). It works fine with 21-mers but the logs are pretty weird when switching to 31-mers and the results are not generated. Here are my commands (I have removed absolute paths here to be clearer):

meryl k=31 count output R1.meryl SRR5893651_1.fastq.gz
meryl k=31 count output R2.meryl SRR5893651_2.fastq.gz
meryl union-sum output reads.meryl R1.meryl R2.meryl

merqury.sh reads.meryl 161010_Chinese_Spring_v1.0_pseudomolecules.fasta merqury

And here is the log: merqury.spectra-cn.log

Could you please help me solve this problem?

arangrhie commented 2 years ago

Hello @bistace ,

Try running it in a new directory. I see files are left from another merqury run in your log.

bistace commented 2 years ago

Thank you for your help, I cleaned the folder but still got the same problem. Command launched:

merqury.sh /env/cns/bigtmp2/ONT/merqury_wheat/31/Merqury_CS/Meryl/reads.meryl /env/cns/proj/projet_CGD/scratch/chinese_spring/161010_Chinese_Spring_v1.0_pseudomolecules.fasta merqury

Here is the new logfile merqury.spectra-cn.log

arangrhie commented 2 years ago

What do you get from this?

meryl histogram reads.meryl | head

Also, which meryl version are you using?

bistace commented 2 years ago

Output of the command:

1       6954855351
2       651559692
3       567279114
4       639708151
5       639411656
6       567020056
7       456044903
8       341047298
9       243459493
10      171134740

I am using meryl v1.3.

arangrhie commented 2 years ago

sorry, I meant

meryl statistics reads.meryl | head
bistace commented 2 years ago
Number of 31-mers that are:
  unique             6954855351  (exactly one instance of the kmer is in the input)
  distinct          12085019070  (non-redundant kmer sequences in the input)
  present           92843495781  (...)
  missing   4611686006342368834  (non-redundant kmer sequences not in the input)
arangrhie commented 2 years ago

and how about this?

meryl count k=31 161010_Chinese_Spring_v1.0_pseudomolecules.fasta output 161010_Chinese_Spring_v1.0_pseudomolecules.meryl
meryl statistics 161010_Chinese_Spring_v1.0_pseudomolecules.meryl | head
bistace commented 2 years ago

The first command gives me the meryl help and

Can't interpret '/env/cns/proj/projet_CGD/scratch/chinese_spring/161010_Chinese_Spring_v1.0_pseudomolecules.fasta': not a meryl command, option, or recognized input file

My command:

meryl count k=31 /env/cns/proj/projet_CGD/scratch/chinese_spring/161010_Chinese_Spring_v1.0_pseudomolecules.fasta output 161010_Chinese_Spring_v1.0_pseudomolecules.meryl
bistace commented 2 years ago

Oh, I may understand what is happening. I will try something and report back here.

arangrhie commented 2 years ago

Hope everything went well? Feel free to re-open this issue if you need more help!