phasegenomics / FALCON-Phase

FALCON-Phase integrates PacBio long-read assemblies with Phase Genomics Hi-C data to create phased, diploid, chromosome-scale scaffolds
Other
74 stars 17 forks source link

use of purge haplotigs with falcon-phase #73

Closed Juke34 closed 4 years ago

Juke34 commented 4 years ago

Here a resume of the size of my genome after each step

assembly primary size (bp) haplotig size (bp) total (bp)
falcon unzip 879494072 161125037 1040619109
falcon unzip post purge Haplotigs 615743546 420594391 1036337937
falcon phase round1 679010065 280054206 959064271
scaffolding with allHic 679064365 280054206 959118571
falcon phase round2 703831469 252736878 956568347

The first round of falcon phase sounds to re-incorporate within the primary assembly a part of what has been filtered out by purge haplotigs from the primary assembly.

Is it something expected? I just wondering if this result sounds normal or if I should try different parameters within falcon-phase.

shawnpg commented 4 years ago

Hi,

In general, yes, FALCON-Phase will move sequence from the haplotigs back into the primary contigs (and vice versa). What it is doing is looking at the Hi-C data to see if it can determine heterozygous sequences that likely originated from the same molecule in the nucleus. The long reads alone can often be insufficient information for correct phasing.

Thanks,

Shawn