lerch-a / HaplotypR

GNU General Public License v3.0
4 stars 5 forks source link

dna read reconstruction #9

Closed dcm9123 closed 4 years ago

dcm9123 commented 4 years ago

Hi Anita,

I hope you are doing well. I am trying to figure out how HaplotypR works in regards of DNA read reconstruction. Because I am working with markers which length is about 800 bp, I was wondering if there is a step where HaplotypR reconstructs my reads to the original size and gets the haplotypes out of each one of them. Because I used Nextera in my experiments, there was some random shearing of my fragments, and I get reads of 250 bp of size. Basically my question is, will HaplotypR do haplotyping based on my 250 bp reads alone? Or will it reconstruct my reads to the original haplotype and then do haplotype calling? I am afraid I could be getting something like this:

image

where I get the variation only in one part of my marker.

Thanks in advance,

Daniel

lerch-a commented 4 years ago

Hi Daniel,

HaplotypR does not work on Nextera shared samples. You need to find an other software for that. It does not reconstruct haplotypes from shared amplicons DNA. It assumes full lengths amplicon reads which start and end all at the same position.

Please read my manuscript Lerch, A. et al. Development Of Amplicon Deep Sequencing Markers And Data Analysis Pipeline For Genotyping Multi-Clonal Malaria Infections. BMC Genomics (2017), 18(1), p.864, http://dx.doi.org/10.1186/s12864-017-4260-y http://dx.doi.org/10.1186/s12864-017-4260-y.

Best, Anita

On Oct 28, 2019, at 12:51 PM, dcastaneda5 notifications@github.com wrote:

Hi Anita,

I hope you are doing well. I am trying to figure out how HaplotypR works in regards of DNA read reconstruction. Because I am working with markers which length is about 800 bp, I was wondering if there is a step where HaplotypR reconstructs my reads to the original size and gets the haplotypes out of each one of them. Because I used Nextera in my experiments, there was some random shearing of my fragments, and I get reads of 250 bp of size. Basically my question is, will HaplotypR do haplotyping based on my 250 bp reads alone? Or will it reconstruct my reads to the original haplotype and then do haplotype calling? I am afraid I could be getting something like this:

https://user-images.githubusercontent.com/35580294/67698927-debb9700-f970-11e9-949a-5e80cdd6ba2a.png where I get the variation only in one part of my marker.

Thanks in advance,

Daniel

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/lerch-a/HaplotypR/issues/9?email_source=notifications&email_token=AATLJHEE45TIINJJ4E6VGBDQQ4KB7A5CNFSM4JF5CKZ2YY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4HU2JSQQ, or unsubscribe https://github.com/notifications/unsubscribe-auth/AATLJHE4D4ZSF552WRQWMPTQQ4KB7ANCNFSM4JF5CKZQ.

dcm9123 commented 4 years ago

Hi Anita,

Thank you for your fast response. I was looking at another one of your papers to see if this was addressed by HaplotypR. I noticed that in "Development of amplicon deep sequencing markers and data analysis pipeline for genotyping multi-clonal malaria infections" you worked with 2x500 bp and stated "Amplicon sizes were limited to a maximum of 500 bp to conform to possible read lengths of the Illumina MiSeq platform." which is an excellent way of solving my problem if my data was of 500 (which is not, it is 250 bp). I was reading another paper of yours, "Longitudinal tracking and quantification of individual Plasmodium falciparum clones in complex infections" where I noticed you were dealing with ama1 and cpmp, however I also noticed that you sequenced using MiSeq in 2x250 bp... Did you have a way of solving the variation for your whole gene? Or did you focus on the variation of what the read was covering like in your the HaplotypR paper?

Thanks for your valuable insight,

Daniel

lerch-a commented 4 years ago

Hi Daniel,

Thank you for your fast response. I was looking at another one of your papers to see if this was addressed by HaplotypR. I noticed that in "Development of amplicon deep sequencing markers and data analysis pipeline for genotyping multi-clonal malaria infections" you worked with 2x500 bp

Also in the first paper I used 2x250. Not sure where you got the 2x500bp from. and stated "Amplicon sizes were limited to a maximum of 500 bp to conform to possible read lengths of the Illumina MiSeq platform." which is an excellent way of solving my problem if my data was of 500 (which is not, it is 250 bp). I was reading another paper of yours, "Longitudinal tracking and quantification of individual Plasmodium falciparum clones in complex infections" where I noticed you were dealing with ama1 and cpmp, however I also noticed that you sequenced using MiSeq in 2x250 bp... Did you have a way of solving the variation for your whole gene?

No. This will never work, because you need an alignment step. Or did you focus on the variation of what the read was covering?

Yes. HaplotypR is design to focus on the full-length amplicon fragment with all read flanking the start and end of the amplicon.

Shared amplicons can not be analysed with HaplotypR. HaplotypR is NOT designed for shared reads. You need to look in something like ShoRAH (http://www.biomedcentral.com/1471-2105/12/119).

Best, Anita

dcm9123 commented 4 years ago

Hi Anita,

Sorry, yes, I meant a total read of 500 from 2x250 bp (that was a typo). I figured that would be my case. Thanks for sharing that link with me, I'll try working on it as well.

Thanks again,

Daniel