Open Karenmagh opened 3 weeks ago
@Karenmagh Sorry for taking so long to get back to you!
The dataset is high heterozygosity. PECAT failed to pair some contigs according to their haplotypes. --contig_dup_rate=0.2
may work. Its default value is 0.3. The config can be modified as follows.
asm2_assemble_options= --max_trivial_length 10000 --contig_format dual,prialt --contig_dup_rate=0.2
Hello @lemene,
Thank you very much for your response, I have tried --contig_dup_rate=0.2 and my haplotypes adjusted a little more, I also tried 0.15 and now I have haplotype 1 of 4Gb and haplotype 2 of 3.2Gb, could you help me explain a little about the management appropriate of this parameter? Is there a problem if I lower it more?
I will greatly appreciate your help.
Hi @Karenmagh I'm glad this parameter is working.
If heterozygosity is low, two reads from different haplotypes might map to the same position in the initial assembly (3-assemble/primary.fasta
). PECAT can identify that the overlap between them is inconsistent using the SNP alleles. The pair of the reads is inconsistent. --contig_dup_rate=0.2
means if 20% of the reads that make up one contig are inconsistent with the reads of another contig, then one contig will be placed in 5-assemble/primary.fasta
and the other in 5-assemble/alternate.fasta
. If heterozygosity is high, reads from different haplotypes may not overlap, making it difficult for PECAT to identify inconsistent read pairs. Therefore, it is necessary to reduce this parameter. But I'm unsure of the outcome if it is set too low.
Hello dear developers,
I have run PECAT on a diploid plant genome (heterozygosity=3.3) and obtained a primary (4.7Gb) and alternate (2.6Gb) set and haplotype1 (4.7Gb) and haplotype2 (2.8Gb). Do you think there is a possibility of improving the size of the genomes by moving some parameters? I will greatly appreciate your response.
My script is this: