lemene / PECAT

PECAT, a phased error correct and assembly tool
BSD 2-Clause "Simplified" License
38 stars 1 forks source link

Diploid genome assembly of plants with pacbio #30

Open enriquepola1996 opened 4 months ago

enriquepola1996 commented 4 months ago

Hello dear developers,

I will try PECAT for the first time for a diploid genome of about 500Mb and I am writing to ask for script recommendations. There are some template cfgfile files for some species but I don't know which one I should use as a guide. For example, I ran the default cfgfile for arabidopsis and it generated this set of fasta:

alternate.fasta primary.fasta rd_2_alt_names rd_2_pri_names

When I run cfg_arab_clr I am getting the following: primary.fasta alternate.fasta haplotype_1.fasta haplotype_2.fasta rd_2_pri_names rd_2_alt_names rd_2_hap2_names rd_2_hap1_names

What will the use of these two configurations depend on?

I greatly appreciated your comments.

lemene commented 4 months ago

Hi @enriquepola1996 The difference is caused by the asm2_assemble_options= --contig_format dual,prialt. PECAT can output two contig formats:primary/alternate format or dual assembly format, but only the former is output by default. See https://lh3.github.io/2021/10/10/introducing-dual-assembly.

enriquepola1996 commented 4 months ago

Thank you very much for the information, I understand that a primary assembly is a complete assembly with long stretches of phased blocks (haploid) and an alternate is an incomplete assembly consisting of haplotigs in heterozygous regions, so the dual format would be an approximation to an assembly resolved by haplotypes?

I thank you again for your comments.

lemene commented 4 months ago

The dual assembly format can be considered as two sets of long contigs (primary contigs) with the mosaic of homologous haplotypes.

enriquepola1996 commented 3 months ago

Thank so much for you answer.

lemene commented 1 month ago

Hello @lemene , I have a one question, it is possible to independently polish assemblies with the corrected reads or do I need to do a special treatment to the reads in order to polish each assembly? I'm having trouble with stage 6-polish/racon and was wondering if I can do that stage outside of PECAT.

You can check the commands in the pol_xxx.sh and execute them manually. A key step is to filter out any inconsistent alignments between contigs and reads.