Open enriquepola1996 opened 4 months ago
Hi @enriquepola1996
The difference is caused by the asm2_assemble_options= --contig_format dual,prialt
. PECAT can output two contig formats:primary/alternate format or dual assembly format, but only the former is output by default. See https://lh3.github.io/2021/10/10/introducing-dual-assembly.
Thank you very much for the information, I understand that a primary assembly is a complete assembly with long stretches of phased blocks (haploid) and an alternate is an incomplete assembly consisting of haplotigs in heterozygous regions, so the dual format would be an approximation to an assembly resolved by haplotypes?
I thank you again for your comments.
The dual assembly format can be considered as two sets of long contigs (primary contigs) with the mosaic of homologous haplotypes.
Thank so much for you answer.
Hello @lemene , I have a one question, it is possible to independently polish assemblies with the corrected reads or do I need to do a special treatment to the reads in order to polish each assembly? I'm having trouble with stage 6-polish/racon and was wondering if I can do that stage outside of PECAT.
You can check the commands in the pol_xxx.sh
and execute them manually. A key step is to filter out any inconsistent alignments between contigs and reads.
Hello dear developers,
I will try PECAT for the first time for a diploid genome of about 500Mb and I am writing to ask for script recommendations. There are some template cfgfile files for some species but I don't know which one I should use as a guide. For example, I ran the default cfgfile for arabidopsis and it generated this set of fasta:
alternate.fasta primary.fasta rd_2_alt_names rd_2_pri_names
When I run cfg_arab_clr I am getting the following: primary.fasta alternate.fasta haplotype_1.fasta haplotype_2.fasta rd_2_pri_names rd_2_alt_names rd_2_hap2_names rd_2_hap1_names
What will the use of these two configurations depend on?
I greatly appreciated your comments.