marbl / Mash

Fast genome and metagenome distance estimation using MinHash
mash.readthedocs.org
Other
393 stars 90 forks source link

mash sketch outputs reference.msh with only one row #158

Open Valentin-Bio-zz opened 3 years ago

Valentin-Bio-zz commented 3 years ago

Hello I was using mash before with a set of paired end reads, the issue is that now I'm using the -r flag as recommended when sketching reads (metagenomic reads), this is the code that I use: mash sketch -m 2 -r -o reference *.fa and when I run mash info reference.msh I got the following:

Header: Hash function (seed): MurmurHash3_x64_128 (42) K-mer size: 21 (64-bit hashes) Alphabet: ACGT (canonical) Target min-hashes per sketch: 1000 Sketches: 1

Sketches: [Hashes] [Length] [ID] [Comment]

1000 10789159071 FP.BAC4A_ATATCTCG-ACTAAGAT_L00M.130bp_5prime.fa A00419:387:HVVLYDSXY:2:1101:2736:1000 1:N:0:ATATCTCG+NCTAAGAT

Only got 1 sketched readset.

what can be happening? does the -r option accept only 1 library of reads as entry and the 92 lefting are omitted for sketching?

Thanks for reading :)

ondovb commented 3 years ago

It is sketching all the reads together, but the first file is used for the ID and comment (see -I and -C). To create separate sketch files for each read set, you will need to invoke mash for each sketch (see #71).