twolinin / longphase

GNU General Public License v3.0
102 stars 9 forks source link

haplotag --log #28

Closed bsb2014 closed 1 year ago

bsb2014 commented 1 year ago

I am wondering if the outputted plain-text file also contains PS tags in addition to HP tags. As you suggested, both pieces of information are important to determine haplotypes. Thanks.

twolinin commented 1 year ago

Hi @bsb2014 ,

We have added the 'PhaseSet' and '(PhaseSet,Variantcount)' to the output of the --log to represent the PS tag to a read.

#Read   Chr     ReadStart       Confidnet(%)    Haplotype       PhaseSet        TotalAllele     HP1Allele       HP2Allele       phasingQuality(PQ)      (Variant,HP)    (PhaseSet,Variantcount)
eb459876-8c81-4714-a496-a90ea8be94d2    chr1    10000   0.969697        1       43042   33      32      1       15       47660,0 47695,0 47698,0 47705,0 47760,0 47929,0 48170,0 48182,0 48936,0 48975,0 49147,0 49242,0 49290,0 49313,0 49314,0 49341,0 49362,0 49403,0 49426,0 49481,0 51402,0 51458,0 51498,0 51619,1 51801,0 51805,0 51901,0 51935,0 51940,0 51950,0 52104,0 52143,0 52524,0         43042,33
6ca3a71f-62fd-416e-8c6e-8c4a9c054e1a    chr1    10000   0.984375        2       43042   64      1       63      18       47660,1 47695,1 47698,1 47705,1 47760,1 47929,1 48170,1 48182,1 48936,1 48975,1 49147,1 49242,1 49290,1 49313,1 49314,1 49341,1 49362,1 49403,1 49426,1 49481,0 51402,1 51458,1 51498,1 51619,1 51801,1 51805,1 51901,1 51935,1 51940,1 51950,1 52104,1 52143,1 52524,1 53008,1 53788,1 53816,1 53957,1 54177,1 54394,1 55387,1 56678,1 57063,1 57681,1 57817,1 57855,1 57989,1 57997,1 58431,1 58447,1 58770,1 58811,1 58865,1 58988,1 59275,1 59343,1 60159,1 60272,1 60331,1 60717,1 60828,1 61218,1 61479,1 61578,1 61735,1         43042,64
0d8b7c68-d98d-4045-a572-82fedac62da5    chr1    10000   -nan    .       .       0       0       0       0

This modification will be released in the upcoming version. For now, you can compile using the following methods.

git clone https://github.com/twolinin/longphase.git
cd longphase
autoreconf -i
./configure
make -j 4

Thanks

bsb2014 commented 1 year ago

Many thanks and looking forward to the next binary release.

I can not compile it on the server, and the error is

/usr/bin/ld: cannot find -ldeflate collect2: error: ld returned 1 exit status make: *** [Makefile:57: longphase] Error 1

bsb2014 commented 1 year ago

I am wondering if haplotag can support multiple bam files, such as short and long read bams.

If I merged the short and long read bam together for phasing, will the haplotag tag both short reads and long reads to HP:i:1 or HP:i:2?

Thanks

twolinin commented 1 year ago

Hi @bsb2014,

  1. To solve the compilation issue, I will attempt to troubleshoot the problem. May I know which operating system you are using?
  2. Haplotag involves detecting the overlap between reads and SNPs. In theory, tagging will be performed as long as short reads contain a sufficient number of SNPs.
bsb2014 commented 1 year ago
  1. I am using Ubuntu 16.04.7 LTS
  2. During the phasing, can the short reads bridge the long reads if there are overlaps among them? Thanks
twolinin commented 1 year ago

Hi @bsb2014,

  1. LongPhase by default uses zlib, and Linux should have it installed by default. Could you please check if zlib is installed on your system?
  2. Theoretically not so much as short reads usually cannot span two SNPs, except for dense SNP regions. However, we haven't tested the results of NGS and TGS co-phasing yet. If you don't mind, could you provide the experimental results for ONT-only versus Illumina/ONT hybrid?

Thanks

bsb2014 commented 1 year ago

I will try the ONT-only versus Illumina/ONT hybrid and will let you know the results once done.

bsb2014 commented 1 year ago

In my case, the results look very similar.

ONT-only: Heterozygous phased:       85.6%
Illumina/ONT hybrid: Heterozygous phased:       86.5%

May I ask when the next binary will be released? Thanks.

twolinin commented 1 year ago

Hi @bsb2014,

Thank you for sharing the results of Illumina and ONT co-phasing. Based on previous experience, there are many factors that can affect the proportion of phased variants, such as sequencing coverage, sequencing error rate, read length etc. Currently, there is no confirmed binary release date. You can try resolving the compilation issue using a Dockerfile or the following command.

export LD_LIBRARY_PATH=" your libdeflate path containing the file libdeflate.so.0"

bsb2014 commented 1 year ago

As for Illumina and ONT co-phasing, how does longphase treat pair-end reads? Does it treat one pair as two seperated reads or as one fragment? Thanks

twolinin commented 1 year ago

Hi @bsb2014

Currently, LongPhase is a software developed based on third-generation sequencing, so the two reads of a pair are treated as separate fragments for processing.

bsb2014 commented 1 year ago

I understand that phasing requires a read spanning two variants. I am wondering how many phased variants in a read are required for the haplotag? If a read only has one phased variant, will the read be tagged? Thanks

ythuang0522 commented 1 year ago

Yes.