mapleforest / HaploMerger2

40 stars 6 forks source link

About the Allotetraploid Genome #10

Open jmsong2 opened 6 years ago

jmsong2 commented 6 years ago

Dear Haplomerger2 developer,

I'm very happiness to find your nice software and paper.

In my own work, i have one question about the 'heterozygous' definition. For now, I have assembled the genome (Allotetraploid: AACC, 900M ,the assembly should be A+C) with the Falcon. I want to use the Haplomerge2 to reconstruct and output two sub-genome A and C.

But i'm not sure whether it's reasonable because the original purpose of your software is to separated haploid sub-assemblies.

Could you give me some advice?

Best regards, Jiaming

mapleforest commented 6 years ago

Dear Jiaming,

By using HM2, you may have a reference assembly (A1C1) and an alterative assembly (A2C2).

If you want both sub-assemblies by running HM2; please set Falcon to output the diploid assembly,

because the newest version of Falcon may only output the reference haploid assembly and discard the data of another allele.

As an alterative, you may use Canu to compute the diploid assembly. You may run Canu on the corrected reads output by Falcon, it is faster than Falcon.

在 2017/10/30 10:44, jmsong2 写道:

Dear Haplomerger2 developer,

I'm very happiness to find your nice software and paper.

In my own work, i have one question about the 'heterozygous' definition. For now, I have assembled the genome (Allotetraploid: AACC, 900M ,the assembly should be A+C) with the Falcon. I want to use the Haplomerge2 to reconstruct and output two sub-genome A and C.

But i'm not sure whether it's reasonable because the original purpose of your software is to separated haploid sub-assemblies.

Could you give me some advice?

Best regards, Jiaming

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/mapleforest/HaploMerger2/issues/10, or mute the thread https://github.com/notifications/unsubscribe-auth/AOtnABskWMuKa4ygAooeK7yC79nXj05Aks5sxTf7gaJpZM4QKmx5.

--

best regards,

黄盛丰 Shengfeng Huang 中山大学生命科学学院 School of life sciences, Sun Yat-sen university hshengf2@mail.sysu.edu.cn http://sklbc.sysu.edu.cn/Team/User/info.aspx?typeid=283&pid=46


本邮件及其附件含有发送给特定个人和用于特定目的的保密信息。如果您不是预期的收件人,请立即删除本邮件并通知发件人。严禁任何非预期的收件人使用、传播、分发或复制本邮件或其附件。 This email and its attachments may contain confidential information intended for a specific individual and purpose. If you are not the intended recipient, you should delete this email and notify the sender immediately. Any use, dissemination, distribution, or copying of this email or its attachments by persons other than the intended recipient(s), is strictly prohibited.

jmsong2 commented 6 years ago

Thank you for your kind advices!

It's very efficient about your strategy to speed Canu up.

As you said, the HM2's output will give me the A1C1 and A2C2 assembly by inputted the diploid assembly (e.g Canu). At certain times, this is a cool work!

But my initial purpose is to separate the A and C sub-genome. Could the HM2 give a reference assembly (A) and an "alterative" assembly (C)? Is this practicable?

Best, Jiaming

mapleforest commented 6 years ago

Dear Jiaming,

You may try.

Give me some statistics about average difference between A-C, and between A1C1-A2C2, I may have a clear idea.

For starter, ideally, suppose difference between A and C is 10%, bewteen A1 and A2 is 2%, and between C1 and C2 is 2%.

Then the first round of HM2 (using parameters adjusted for 2% differences)

will give you A1C1 and A2C2 (but it should be noted that it is a mosaic haploid assembly, not phased).

After that, if you run a second run of HM2 (by adjusting the paramters for 10% difference) on the reference A1C1 assembly,

you may get separated As and Cs, but intead of this

group 1: A1,A2,A3,A4,A5

group 2: C1,C2,C3,C4,C5

the outcome is more like this (mosaic of As and Cs):

group 1: A1,C2,A3,C4,A5

group 2: C1,A2,C3,A4,C5.

Anyway, it depends on many factors, for example, the errors in the diploid assembly may be carried over to the final results.

However, the second round of HM2 can at least generate accurate whole-genome alignments between As and Cs.

You can work with these alignments,which I think suffice necessary analysis asscociated with allodiploids.

在 2017/10/30 15:47, jmsong2 写道:

Thank you for your kind advices!

It's very efficient about your strategy to speed Canu up.

As you said, the HM2's output will give me the A1C1 and A2C2 assembly by inputted the diploid assembly (e.g Canu). At certain times, this is a cool work!

But my initial purpose is to separate the A and C sub-genome. Could the HM2 give a reference assembly (A) and an "alterative" assembly (C)? Is this practicable?

Best, Jiaming

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/mapleforest/HaploMerger2/issues/10#issuecomment-340366775, or mute the thread https://github.com/notifications/unsubscribe-auth/AOtnAB0hD0e4TqkptsnweGI1ccBv3ehFks5sxX8PgaJpZM4QKmx5.

--

best regards,

黄盛丰 Shengfeng Huang 中山大学生命科学学院 School of life sciences, Sun Yat-sen university hshengf2@mail.sysu.edu.cn http://sklbc.sysu.edu.cn/Team/User/info.aspx?typeid=283&pid=46


本邮件及其附件含有发送给特定个人和用于特定目的的保密信息。如果您不是预期的收件人,请立即删除本邮件并通知发件人。严禁任何非预期的收件人使用、传播、分发或复制本邮件或其附件。 This email and its attachments may contain confidential information intended for a specific individual and purpose. If you are not the intended recipient, you should delete this email and notify the sender immediately. Any use, dissemination, distribution, or copying of this email or its attachments by persons other than the intended recipient(s), is strictly prohibited.

jmsong2 commented 6 years ago

I don't know the average difference between A-C, but i know their genome size are very different (maybe A genome is 550M bp, C genome is 350M bp). In addition, the difference between A1 and A2 is less than 1%. Because the sample is approximate to homozygous.

Regards, Jiaming

mapleforest commented 6 years ago

OK.

1\ If the difference between A1 and A2 is less than 1%, it is highly possible

the Pacbio-based assemblers may not have well resolution on the two alleles,

leaving a small fraction of alleles in the diploid assembly.

For example, for a genome with 4% heterozygosity, in the diploid assembly we expect >70% of the loci are covered by two alleles.

For a genome with 1% heterozygosity, possibly only 5-30% of the loci are covered by both alleles in the diploid assembly.

This is the first question you should consider. But HM2 will help you to separate the redundant alleles anyway.

2\ HM2 will not help you to separate A from C in your case, in terms of reconstructing A and C.

However, HM2 can be used as an easy-to-use whole genome aligner with high-sensitivity, accuracy and rich output (it is based on lastz and chainNet).

在 2017/10/30 16:59, jmsong2 写道:

I don't know the average difference between A-C, but i know their genome size are very different (maybe A genome is 550M bp, C genome is 350M bp). In addition, the difference between A1 and A2 is less than 1%. Because the sample is approximate to homozygous.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/mapleforest/HaploMerger2/issues/10#issuecomment-340381694, or mute the thread https://github.com/notifications/unsubscribe-auth/AOtnAHWOLV1DJT9Qa3wTf70qjKzKFLeAks5sxY_3gaJpZM4QKmx5.

--

best regards,

黄盛丰 Shengfeng Huang 中山大学生命科学学院 School of life sciences, Sun Yat-sen university hshengf2@mail.sysu.edu.cn http://sklbc.sysu.edu.cn/Team/User/info.aspx?typeid=283&pid=46


本邮件及其附件含有发送给特定个人和用于特定目的的保密信息。如果您不是预期的收件人,请立即删除本邮件并通知发件人。严禁任何非预期的收件人使用、传播、分发或复制本邮件或其附件。 This email and its attachments may contain confidential information intended for a specific individual and purpose. If you are not the intended recipient, you should delete this email and notify the sender immediately. Any use, dissemination, distribution, or copying of this email or its attachments by persons other than the intended recipient(s), is strictly prohibited.

jmsong2 commented 6 years ago

Well, very clearly.

Thank you for your patience very much.

I'll try to do the first one.