marbl / meryl

A genomic k-mer counter (and sequence utility) with nice features.
113 stars 13 forks source link

the parental specific kmers were intersected in progeny genome #40

Closed leon945945 closed 3 months ago

leon945945 commented 6 months ago

Hi, I took use of meryl to identify individual-specific kmers with difference subcommand as below: meryl difference paternal.meryl/ maternal.meryl/ output paternal-specific.meryl meryl difference maternal.meryl/ paternal.meryl/ output maternal-specific.meryl following the differnece subcommand description, I put the paternal kmers first and maternal kmers second to identify paternal-specific kmers, then maternal kmers first and paternal kmers second to identify maternal-specific kmers.

Then meryl-lookup was used to identify paternal-specific or maternal-specific kmers in progeny as below: meryl-lookup -sequence progeny.genome.fa -mers maternal-specific.meryl -bed-runs > progeny-maternal.bed meryl-lookup -sequence progeny.genome.fa -mers paternal-specific.meryl -bed-runs > progeny-paternal.bed I compared the two bed files with bedtools intersect -a progeny-maternal.bed -b progeny-paternal.bed -wa -wb | wc -l, there are over 50,000 overlapped intervals.

My confusion is that the paternal-specific or maternal-specific kmers are individual-specific, why they are overlapped in progeny genome?

Much thanks if you can give me suggestions.

leon945945 commented 6 months ago

Hi, I checked the length of kmer intervals in progeny genome. The paternal-specific kmers occupied 70Mb. The maternal-specific kmers occupied 218Mb. The overlapped intervals of paternal-specific kmers and maternal-specific kmers are 1.3Mb. The length of progeny genome is 356Mb.

leon945945 commented 3 months ago

Maybe I get the reason.