twolinin / longphase

GNU General Public License v3.0
98 stars 6 forks source link

Combine illumina and ONT reads in phasing #9

Open ramesh8v opened 2 years ago

ramesh8v commented 2 years ago

Hi,

Thank you for developing this tool. It has been very helpful. I have samples sequenced using both Illumina (>100X coverage) and ONT (~20X coverage). I'm wondering how to provide both the bam files to the Longphase? Does Longphase accept Illumina reads? I understand the ONT bam file is indicated by -ont flag and pacbio bam file is indicated by -pb flag, is there a way to provide an Illumina bam file along with ONT? Using both illumina and ONT reads increased the size of the phased blocks in WhatsHap compared to using ONT alone, so I am interested in trying the same with Longphase. Thanks.

ythuang0522 commented 2 years ago

Hi @ramesh8v, Short reads should be theoretically ok as the accuracy is the same as PacBio HiFi, though we haven't tested Illumina yet. We didn't find WhatsHap invented new algorithms for hybrid ONT/Illumina phasing. As such we expect it's based on the same algorithm for long-read phasing. For running your data with LongPhase, you have to merge and index the two (sorted) bam files (ONT and Illumina) into one, e.g., samtools merge merged.bam in.1.bam in.2.bam samtools index merged.bam

I would try using -ont flag in LongPhase as ONT long reads are the major source for spanning heterozygous variants. We'll spare time testing this hybrid phasing mode from public data. Your feedback is welcome.