marbl / meryl

A genomic k-mer counter (and sequence utility) with nice features.
119 stars 14 forks source link

simple-dump or analogous tool #5

Closed erikenbody closed 5 years ago

erikenbody commented 5 years ago

Hi there -

Thanks for putting together this easy to use tool! I am working to put together the hap_kmer blob plot with my own unzip and canu assemblies, but encountering difficulty.

Namely I am having trouble generating the input for the script hap_kmer_plot.R. The input seems to come from scripts/meryl_count/meryl2_hapmers.sh, however I can't seem to find simple-dump in this repo of meryl.

Nevertheless I created .mcdat input following the triobinningScripts repo directions (using the meryl version there) and simple-dump does not execute as expected.

e.g. when given: simple-dump -s hap1.k21.filt.nohap2k21.filt -e hap1.k21.filt.only -m 21 it prints the usage: usage: simple-dump -m mersize -mers mers [-exist existDB] -seq fasta > output

The same issue happens with meryl included in my canu 1.7 install.

Does the current meryl here offer a similar methodology for inputting a k-mer database and an assembly fasta and outputting the counts per scaffold/contig? Otherwise do you have any suggestions for which older branch I can leverage for this purpose?

I realize this is likely all under development which I understand! Thanks for any suggestions.

Erik

skoren commented 5 years ago

The latest release candidate (canu branch 1.9) or this version of meryl should both have beryl-lookup which replaces simple-dump. It will give you info on the kmers from a db in a fasta file.

arangrhie commented 5 years ago

meryl-lookup -existence gives you the number of kmers intersected in both your sequence fasta file and haplotype specific kmer db. Sorry for the confusion with the uncomplete codes. Will try update my script when I get more time.

erikenbody commented 5 years ago

Hey that's great- I'm just grateful to see the k-mer dump is implemented here! Overall this version of meryl is more intuitive to me than before. I ran meryl-lookup on a small dataset this morning with success using this branch, thanks!