schatzlab / pseudohaploid

Create a pseudohaploid assembly from a partially resolved diploid assembly
32 stars 5 forks source link

Some questions about using pseudohaploid. #5

Open xiekunwhy opened 4 years ago

xiekunwhy commented 4 years ago

Hi, I have some questions about using pseudohaploid.

1) For polishing, pseudohaploid use before polishing or after polishing? 2) For repeat mask, pseudohaploid use before repeat masking or after repeat masking? 3) How long will it take to run the whole pipeline of pseudohaploid for a ~3.2G plant genome (~4.5G generated from wtdgb2)?

Best wishes, Kun

mschatz commented 4 years ago

Hi Kun,

Please see below-

On Wed, Mar 18, 2020 at 12:42 AM xiekunwhy notifications@github.com wrote:

Hi, I have some questions about using pseudohaploid.

  1. For polishing, pseudohaploid use before polishing or after polishing?

I would recommend before polishing since you are going to be filtering out about half of your assembly.

  1. For repeat mask, pseudohaploid use before repeat masking or after repeat masking

If you use nucmer for the alignments, I would do it before repeat masking but then you'll need to tune the parameters to avoid computing too many alignments. The critical values are -l (minimum exact match length) and -c (minimum cluster length of alignments). Depending on the species, etc, you will probably need to set -l around 50 to 100 and -c around 100 to 500. If this takes excessively long you could increase the lengths to -l 250 -c 2500 (or larger).

  1. How long will it take to run the whole pipeline of pseudohaploid for a ~3.2G plant genome (~4.5G generated from wtdgb2)?

The longest phase will be computing the whole genome alignments. If you have access to a cluster I would recommend sge_mummer: https://github.com/fritzsedlazeck/sge_mummer

Once you have that the postprocessing will be.a few hours

Good luck

Mike

Best wishes, Kun

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/schatzlab/pseudohaploid/issues/5, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABP342PBABU6WFXSFTFTTTRIBGK5ANCNFSM4LOFQRLQ .

xiekunwhy commented 4 years ago

Hi Mike, Thank you for your suggestions, I ran pseudohaploid using raw contig output from wtdgb2(~4.2G, when expected size is 3.2G, N50 ~375k, L50 ~2800), but just only two small contigs(smaller than 10k) were removed. All parameters are as following, any other suggestions?

MIN_IDENTITY=90 MIN_LENGTH=1000 MIN_CONTAIN=93 MAX_CHAIN_GAP=20000 nucmer --maxmatch -c 100 -l 500

Best, Kun

mschatz commented 4 years ago

You could try decreasing the MIN_CONTAIN and increase MAX_CHAIN_GAP but these parameters are very sample specific. Unfortunately, I dont have an automated procedure for setting them right now

Good luck

Mike

On Thu, Mar 19, 2020 at 6:20 AM xiekunwhy notifications@github.com wrote:

Hi Mike, Thank you for your suggestions, I ran pseudohaploid using raw contig output from wtdgb2(~4.2G, when expected size is 3.2G, N50 ~375k, L50 ~2800), but just only two small contigs(smaller than 10k) were removed. All parameters are as following, any other suggestions?

MIN_IDENTITY=90 MIN_LENGTH=1000 MIN_CONTAIN=93 MAX_CHAIN_GAP=20000 nucmer --maxmatch -c 100 -l 500

Best, Kun

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/schatzlab/pseudohaploid/issues/5#issuecomment-601100765, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABP347HDQER5MNHSSBCNDLRIHWYDANCNFSM4LOFQRLQ .

xiekunwhy commented 4 years ago

Hi Mike,

Still not work well when I changed these two parameters (MIN_CONTAIN=50 MAX_CHAIN_GAP=50000, but only 10 short contig were removed). Any other suggestions?

Or I think I need to try some other tools, and do you have some recommendation?

Best, Kun

mschatz commented 4 years ago

Im afraid I would need to review the data to offer any more specific advice. Have you tried plotting a dotplot to look for co-linear contigs?

Good luck

Mike

On Thu, Mar 19, 2020 at 9:15 PM xiekunwhy notifications@github.com wrote:

Hi Mike,

Still not work well when I changed these two parameters (MIN_CONTAIN=50 MAX_CHAIN_GAP=50000, but only 10 short contig were removed). Any other suggestions?

Or I think I need to try some other tools, and do you have some recommendation?

Best, Kun

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/schatzlab/pseudohaploid/issues/5#issuecomment-601486755, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABP343U2RWCSY5Y2HT3BADRIK7U3ANCNFSM4LOFQRLQ .