marbl / meryl

A genomic k-mer counter (and sequence utility) with nice features.
119 stars 14 forks source link

individual specific kmer could be find in other individual #53

Open leon945945 opened 3 days ago

leon945945 commented 3 days ago

Hi, the purpose of my analysis is to get the paternal specific kmer, and to increase the specificity of kmer I chosed 62mer to perform analysis, because 72mer would lead to an error of meryl.

At first, I generated the 62mer of progeny, maternal and paternal with HiFi data for progeny and maternal, Illumina data for paternal. meryl count k=62 progeny.fq output progeny.meryl meryl count k=62 maternal.fq output maternal.meryl meryl count k=62 paternal.fq output paternal.meryl

Then, the maternal/paternal count of common kmer between progeny and maternal/paternal were outputed. meryl intersect maternal.meryl/ progeny.meryl/ output mat-progeny_commonk62.meryl meryl intersect paternal.meryl/ progeny.meryl/ output pat-progeny_commonk62.meryl and printed the kmer count: meryl print mat-progeny_commonk62.meryl > mat-progeny.kmerCount meryl print pat-progeny_commonk62.meryl > pat-progeny.kmerCount

Then, I merged the mat-progeny.kmerCount and pat-progeny.kmerCount, and I thought the maternal kmers whose count equals to 0 were paternal specific kmer. I extract the sequences of paternal specifc kmers and aligned them to the maternal genome (assembled by the same HiFi file with kmer count) with bowtie2, however, a large proportion of them could be aligned to maternal genome without mismatch or indel (CIGAR=62M), which means the paternal specific kmers could be find in maternal.

Here is my confusion, the paternal specific kmer should not be existed in maternal, why they could be aligned in maternal genome. Hope for your reply, thanks very much.

arangrhie commented 3 days ago

Hello @leon945945 ,

meryl intersect takes the count of the first meryl db.

I think you want the meryl difference instead. It takes the kmer count of the first input db, which does not exist in the second db.

Merqury/trio/hapmers.sh does exactly what you are trying to do - and generates a histogram to visually confirm the hapmer spectrum. I call the parental specific kmers inherited to the child as hapmers.

best, Arang

leon945945 commented 3 days ago

Thanks for your reply, I will try to take use of merqury. And where is the wrong place of my analysis, firstly get the shared kmers between progeny and maternal, progeny and paternal, then merge the two kmer database.Thus, maternal kmers whose count equals to 0 is paternal specific kmers, paternal kmers whose count equals to 0 is maternal specific kmers. It's my pleasure if you can give me some advances.