marbl / canu

A single molecule sequence assembler for genomes large and small.
http://canu.readthedocs.io/
654 stars 179 forks source link

TrioCanu: uniq parental k-mers #1256

Closed bfr42 closed 5 years ago

bfr42 commented 5 years ago

Hello,

I would like to have the uniq parental k-mers computed by TrioCanu and their numbers. Is there a way to get these? If I understand right /haplotype/0-mercounts-haplotype1/haplotype1.ms20.only.mcdat and the statistics in haplotype1.ms20.histogram.info comprise k-mers that can be in haplotype1 and haplotype2.

Thank you!

skoren commented 5 years ago

That's right, the *only* files contain haplotype-specific k-mers. The histograms are reported for the full input dataset as part of counting the k-mers initially. You can make histograms or dump the actual haplotype-specific k-mers from the *only* files using meryl. For example:

meryl -Dh -s 0-mercounts-haplotype1/haplotype1.ms20.only
meryl -Dt  -s 0-mercounts-haplotype1/haplotype1.ms20.only

In the latter case this will dump all k-mers so you probably want to look at the Canu haplotypeReads.sh commands to see what threshold it used.