marbl / SALSA

SALSA: A tool to scaffold long read assemblies with Hi-C data
MIT License
182 stars 47 forks source link

scaffold orientation #157

Closed olechnwin closed 2 years ago

olechnwin commented 2 years ago

Hi,

I was using salsa to scaffold my contigs with Hi-C. I did the scaffolding for each haplotype separately.

Plotting haplotype1, the orientation matches the reference (hg38): image

However, in haplotype2, many of the scaffolds' orientation are inverted. image

Do you have any suggestion on how to fix the orientation in the haplotype2?

Thank you in advance for your help!

plnspineda commented 2 years ago

Hi, I'm sorry this is not related to your question, I just wanted to ask how did you make the graph? What tool did you use? Thank you!

olechnwin commented 2 years ago

@paulenepineda, I was using dgenies to make the graph https://github.com/genotoul-bioinfo/dgenies.

skoren commented 2 years ago

It looks from the plot like the entire (or almost entire) scaffold is inverted. The orientation of the scaffold is arbitrary from the assembly and so some will be flipped by chance. I suspect the visualization is already flipping scaffolds to match the reference in most cases. However, when there are some small inversions at the start/end, it seems to use this small bit to set the orientation of the entire scaffold. Not sure if there are options to control this but it is essentially a visualization artifact.

It is plausible there are real inversions between haplotypes or they could be assembly errors (HiC scaffolding tends to introduce inversion errors). So you'd have to validate these inversions with other information (read mapping/strand-seq if it exists/etc).