Open leon945945 opened 3 months ago
Hi,
I would do the kmer counting first, and do an intersection with the kmer database of interest (let's say original.meryl
).
For example for input1:
meryl count k=$k input1.fastq.gz output input1.meryl
meryl intersect input1.meryl original.meryl
meryl print input1.meryl
The output will give you the counts in input1.meryl, for the kmers that exists in both input1.meryl and original.meryl. Then repeat this for input2, and merge the results.
meryl-lookup
adds counts of the input meryl kmer database. It would not do the counting of the input sequence (your fastq reads in your case).
If you'd like to use meryl-lookup
, and by-pass the intersect / print / manual merging, you could do it the other way around by using input1.meryl and input2.meryl on your original sequence. You could potentially make fake sequences, making each kmer its own sequence from your original.meryl with meryl print result.
Thanks for your quick reply.
I get the idea of comparing kmer count in multiple fastq files. The kmers whose count number equals to 0 in input1.meryl could be extracted with meryl difference original.meryl input1.meryl
command.
Due to the large amounts of kmer sequences, maybe the manual merging is a toughh work to do.
Hi, I generated a kmer database with
meryl count
command, and I want to count the kmers of the database in different query fastq files, how to do it withmeryl
. I saw thatmeryl-lookup
did the similar job, but the output file ofmeryl-lookup
is not what I want, I want to compare the number of each kmer in different queries, my expected output is like this:Hope for your reply.