mapleforest / HaploMerger2

40 stars 6 forks source link

Discussion about your software #9

Open nottwy opened 6 years ago

nottwy commented 6 years ago

I'm really interested in your work and also consider problems like this. So I want to discuss some problems with you. Questions:

  1. The input of your software is just an assembly result ? (If I don't want to scaffold or fill gap)
  2. Do you just seperate the potential heterozygous sites in the assembly result your provide?
  3. Based on question 2, so the output of your software would be an incomplete primary haplotype and alternative haplotye? And the sum of them is equal to the assembly result your provide?

Thank you for your reply.

mapleforest commented 6 years ago

Dear Nott Yu,

Thanks for your interests.

  1. Yes, only the raw diploid assembly is required, but ensure that it has sufficiently long N50 size and has repeats soft-masked.

  2. Yes, just to separate them, no new sequences will be added to the assembly unless you invocates the gap-filling procedure.

  3. No. HM2 will compute a complete reference haploid assembly, which is the best representative haploid assembly that HM2 could find in the raw diploid assembly. In other words, HM2 will take portions from both haploid assemblies to piece up a complete but mosaic haploid reference assembly. For any locus that have two alleles avaible, one of them will be placed in the alternative assembly. However, I would not say it is a "haplotype" assembly because the raw assembly could not guarantee phased haplotypes, and HM2 will not change that situation.

在 2017/9/20 12:04, Nott Yu 写道:

I'm really interested in your work and also consider problems like this. So I want to discuss some problems with you. Questions:

  1. The input of your software is just an assembly result ? (If I don't want to scaffold or fill gap)
  2. Do you just seperate the potential heterozygous sites in the assembly result your provide?
  3. Based on question 2, so the output of your software would be an incomplete primary haplotype and alternative haplotye? And the sum of them is equal to the assembly result your provide?

Thank you for your reply.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/mapleforest/HaploMerger2/issues/9, or mute the thread https://github.com/notifications/unsubscribe-auth/AOtnAKKdz9jxCyfdsgpW5lKlxLCSTpNoks5skI7RgaJpZM4PdUQl.

--

best regards,

黄盛丰 Shengfeng Huang 中山大学生命科学学院 School of life sciences, Sun Yat-sen university hshengf2@mail.sysu.edu.cn http://sklbc.sysu.edu.cn/Team/User/info.aspx?typeid=283&pid=46


本邮件及其附件含有发送给特定个人和用于特定目的的保密信息。如果您不是预期的收件人,请立即删除本邮件并通知发件人。严禁任何非预期的收件人使用、传播、分发或复制本邮件或其附件。 This email and its attachments may contain confidential information intended for a specific individual and purpose. If you are not the intended recipient, you should delete this email and notify the sender immediately. Any use, dissemination, distribution, or copying of this email or its attachments by persons other than the intended recipient(s), is strictly prohibited.

nottwy commented 6 years ago

As far as I know, only the assembly result generated by NGS assembler will conserve a large number of heterozygous sequences. If there isn't or little of heterozygous sequences remaining in our assembly result, HM2 will be not able to find out and seperate the heterozygous sequences and output two haplotypes. Is that true?

mapleforest commented 6 years ago

Dear Nott,

Actually, any good assemblers should preserve heterozygous sequences. HM2 has been used to process assemblies based on Sanger method, 454, Illumina and PacBio sequencing data.

However, I remember there was a paper report that they sucessfully used HM to polish their genome assembly, which has only 5-10% of heterozygosity site.

Anyway, HM2 are supposed to handle the raw assembly well, no matter it has 5% heterozygosity site or 95%.

在 2017/9/20 13:54, Nott Yu 写道:

As far as I know, only the assembly result generated by NGS assembler will conserve a large number of heterozygous sequences. If there isn't or little of heterozygous sequences remaining in our assembly result, HM2 will be not able to find out and seperate the heterozygous sequences and output two haplotypes. Is that true?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/mapleforest/HaploMerger2/issues/9#issuecomment-330752560, or mute the thread https://github.com/notifications/unsubscribe-auth/AOtnAGiGMe8C3gzJtDQpDyfr9QnUgBKAks5skKiugaJpZM4PdUQl.

--

best regards,

黄盛丰 Shengfeng Huang 中山大学生命科学学院 School of life sciences, Sun Yat-sen university hshengf2@mail.sysu.edu.cn http://sklbc.sysu.edu.cn/Team/User/info.aspx?typeid=283&pid=46


本邮件及其附件含有发送给特定个人和用于特定目的的保密信息。如果您不是预期的收件人,请立即删除本邮件并通知发件人。严禁任何非预期的收件人使用、传播、分发或复制本邮件或其附件。 This email and its attachments may contain confidential information intended for a specific individual and purpose. If you are not the intended recipient, you should delete this email and notify the sender immediately. Any use, dissemination, distribution, or copying of this email or its attachments by persons other than the intended recipient(s), is strictly prohibited.

nottwy commented 6 years ago

You are correct. So let's get further about this topic. I mean some 3GS assemblers will output the heterozygous sequences that they find to another file rather than in the assembly result, like falcon. So there would be no heterozygous sequences for HM2 to seperate. In such situation, I can add the heterozygous sequences into the original assembly result and invoke HM2. Can I expect to get two haplotypes whose lengths are both equal to the length of the original assembly result? And assuming that I can get two hyplotypes with HM2, how about the region where there is no difference between two haplotypes? How do you solve such condition? Do you just assign the sequence of this region to two haplotypes?

mapleforest commented 6 years ago

This is a very goog question. Unfortunately I have not yet tried the new version of Falcon.

Intuitively, a clear and easy-to-control path for haplotypic assembly could be:

a good dipoid assembly -> haploid separation -> haplotype phasing based on the reference haploid assembly.

I do not know the quality and redundancy of heterozygous sequences they throw out (they throw out half the data rather than produce both haploid assemblies means something);

Anyway, the method you propose highly depends on the quality and redundancy of heterozygous sequences.

As an alternative, could you force the assembler to output both haploid sequences?

When there is no difference between two alleles, HM2 places the same allele in both the reference and the alternative assemblies.

在 2017/9/20 14:54, Nott Yu 写道:

You are correct. So let's get further about this topic. I mean some 3GS assemblers will output the heterozygous sequences that they find to another file rather than in the assembly result, like falcon. So there would no heterozygous sequences for HM2 to seperate. In such situation, I can add the heterozygous sequences into the original assembly result and invoke HM2. Can I expect to get two haplotypes whose lengths are both equal to the length of the original assembly result? And assuming that I can get two hyplotypes with HM2, how about the region where there is no difference between two haplotypes? How do you solve such condition? Do you just assign the sequence of this region to two haplotypes?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/mapleforest/HaploMerger2/issues/9#issuecomment-330761971, or mute the thread https://github.com/notifications/unsubscribe-auth/AOtnAN-TY2h4CrakKMCdmlBHZxP6ts4sks5skLaqgaJpZM4PdUQl.

--

best regards,

黄盛丰 Shengfeng Huang 中山大学生命科学学院 School of life sciences, Sun Yat-sen university hshengf2@mail.sysu.edu.cn http://sklbc.sysu.edu.cn/Team/User/info.aspx?typeid=283&pid=46


本邮件及其附件含有发送给特定个人和用于特定目的的保密信息。如果您不是预期的收件人,请立即删除本邮件并通知发件人。严禁任何非预期的收件人使用、传播、分发或复制本邮件或其附件。 This email and its attachments may contain confidential information intended for a specific individual and purpose. If you are not the intended recipient, you should delete this email and notify the sender immediately. Any use, dissemination, distribution, or copying of this email or its attachments by persons other than the intended recipient(s), is strictly prohibited.

nottwy commented 6 years ago

thank u for your instant reply and share of your consideration which saves me lots of time to explore your tool by myself. HM2 is really an interesting tool and will be much more useful and powerful if it can be further ungraded to adapt the output of popular 3GS assembly tools like falcon and canu.

mapleforest commented 6 years ago

It works fine with both. HM2 is a better downstream processor in the perspective of algorithm.

You are welcome to explore the question deeper.

在 2017/9/20 16:18, Nott Yu 写道:

thank u for your instant reply and share of your consideration which saves me lots of time to explore your tool by myself. HM2 is really an interesting tool and will be much more useful and powerful if it can be further ungraded to adapt the output of popular 3GS assembly tools like falcon and canu.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/mapleforest/HaploMerger2/issues/9#issuecomment-330780154, or mute the thread https://github.com/notifications/unsubscribe-auth/AOtnAJOOHF7rFg4OU18dUOy5lVaJOHRFks5skMpsgaJpZM4PdUQl.

--

best regards,

黄盛丰 Shengfeng Huang 中山大学生命科学学院 School of life sciences, Sun Yat-sen university hshengf2@mail.sysu.edu.cn http://sklbc.sysu.edu.cn/Team/User/info.aspx?typeid=283&pid=46


本邮件及其附件含有发送给特定个人和用于特定目的的保密信息。如果您不是预期的收件人,请立即删除本邮件并通知发件人。严禁任何非预期的收件人使用、传播、分发或复制本邮件或其附件。 This email and its attachments may contain confidential information intended for a specific individual and purpose. If you are not the intended recipient, you should delete this email and notify the sender immediately. Any use, dissemination, distribution, or copying of this email or its attachments by persons other than the intended recipient(s), is strictly prohibited.