Closed vergilback closed 2 months ago
It seems that you are attempting to scaffold a haplotype-resolved assembly, and these unanchored contigs come originate from approximately three chromosomes with much higher Hi-C link density than other chromosomes. These unanchored contigs might be filtered out during the clustering step by the default parameter --density_upper
, and they may not be rescued to a certain group during the reassignment step because the whole chromosomes were filtered together.
This issue could be due chromosome-level collapses (where assemblers merged homologous chromosomes into a single one) or significant differences between homologous chromosomes (e.g., the non-PAR regions in human X and Y chromosomes)
To address this problem, you may try adding the following parameters, as we did in scaffolding the human HG002 genome in our paper:
--density_upper 1
: Prevents filtering of contigs with much higher Hi-C link density;
--normalize_by_nlinks
: Normalizes the contact matrix for clustering based on the number of Hi-C links on each contig.
Alternatively, in your case, manually scaffolding them in Juicebox is also a straightforward option.
Hello, I tried adding the two parameters separately:
Adding --normalize_by_nlinks
seemed to reduce the anchoring rate, and many homologous chromosomes with huge size, which showed clear signs of being chimeric, were messily anchored to the same linkage group. This suggests that this parameter might not be suitable for my species' data.
After adding --density_upper 1
, the program encountered an error:
RuntimeError: Pipeline Abortion: Inflation recommendation failed. It seems that some chromosomes were grouped together, or the maximum number of clusters is even less than the expected number of chromosomes. For more details, please check out the logs.
Do you have any other suggestions?
There are too many collapses in the assembly. When --density_upper
was set to 1
, collapsed contigs were also retained during clustering, which resulted in this error. So, I think you can manually rescue these contigs in Juicebox based on your first version of result.
Hello, I used HapHiC for chromosome scaffolding. However, since my species has a large number of chromosomes including microchromosomes, there is a significant size difference between the chromosomes (ranging from 1M to 100M). I lowered the min_group_len to 1M, which improved the anchoring rate to some extent. After zooming in on the Hi-C heatmap, I noticed that some contigs that were not anchored to the linkage groups still show significant Hi-C signals. Do you have a suitable parameter combination to improve the scaffolding of these contigs? Or will I need to anchor them manually? Thank you.