oushujun / Maize_NC358

The repository for codes developed to generated the Maize NC358 assemblies and analyses.
3 stars 8 forks source link

less corrected reads #4

Open ttian627 opened 2 years ago

ttian627 commented 2 years ago

Hello

I am assemblying a maize genome recently. I have 250G pacbio sqeuel II reads with reads N50 25k, it quality seems well. But, when I corrected my subreads with FALCON, I got 84G corrected reads with N50 13K or 36G corrected reads with N50 24K with different length cutoff. After assemblyed with CANU, the contig N50 was about 1.5M like the B73v4 result. After check the reads quality with SequelTools, the PSR was 0.88 and the ZOR was 0.86. I want to know what causes the low assembly quality or less corrected reads ? I'm looking forward to your earliest reply. Thank you very much!

Tian

oushujun commented 2 years ago

Hi Tian,

You may want to use our parameters described in the supplementary text (also included below). In our case, Falcon correction then CANU assembly worked well. There may be version changes since our publication, you may want to use the latest version of these programs.

# FALCON-Canu hybrid assembly for 75-fold NC358:  
# FALCON 
pa_HPCdaligner_option = -k14 -e0.75 -s100 -l3000 -h240 -w8 -H14154 pa_DBsplit_option = -x500 -s400 falcon_sense_option = --min_idt 0.70 --min_cov 2 --max_n_read 200 
# CANU 
ovlMerThreshold=500; genome_size=2272400000; input_type=-pacbio-corrected 

Best, Shujun

ttian627 commented 2 years ago

Hello, Shujun

Thank you for your reply. Actually, I used both the old and new FALCON version, and test with the NAM paper pipline as well as your pipline. Are there any other problems?

Thank you

Tian

At 2022-03-31 01:37:15, "Shujun Ou" @.***> wrote:

Hi Tian,

You may want to use our parameters described in the supplementary text (also included below). In our case, Falcon correction then CANU assembly worked well. There may be version changes since our publication, you may want to use the latest version of these programs.

FALCON-Canu hybrid assembly for 75-fold NC358:

FALCON

pa_HPCdaligner_option = -k14 -e0.75 -s100 -l3000 -h240 -w8 -H14154 pa_DBsplit_option = -x500 -s400 falcon_sense_option = --min_idt 0.70 --min_cov 2 --max_n_read 200

CANU

ovlMerThreshold=500; genome_size=2272400000; input_type=-pacbio-corrected

Best, Shujun

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: @.***>

oushujun commented 2 years ago

Hi Tian,

Sorry for the delayed reply. The reported parameters are all we used in the paper. These genomes were assembled in the DNAnexus platform, but I doubt this will create huge differences. You may want to further investigate the data generation and processing of your data. For example, our tissue for sequencing was taken from seedlings following 48 hours of dark treatment to reduce carbohydrates and chloroplasts. The HMW DNA was extracted with the CTAB method. These preparations may make a difference in obtaining high-quality sequences.

Best, Shujun

Arkarachai commented 2 years ago

Hi Tian,

I want to second the opinion of Shujun. Running Falcon on DNAnexus or HPC should give the same results. We also try this protocol on a large number of samples and they all work well. I wonder if you have also checked for the possible contamination by just blast subset of data through nt-DB or check loading stat P0, P1, etc from the machine. Those could be another point worth checking if something goes wrong.