marbl / SALSA

SALSA: A tool to scaffold long read assemblies with Hi-C data
MIT License
182 stars 47 forks source link

Two HiC experiments: recommend combine data or run SALSA sequentially? #133

Closed ckeeling closed 1 year ago

ckeeling commented 3 years ago

Hello,

I have HiC (Chicago) and HiC data, and a gfa file for the draft assembly to scaffold. I am trying both ways, but conceptually, would there be any problem combining the reads from both HiC datasets into one for mapping for SALSA, so that SALSA is informed by the gfa file with the longer-range HiC data, and not just the Chicago HiC data if run sequentially?

Thanks, Chris

skoren commented 3 years ago

I would expect sequential runs would work better. The issue with combining these libraries is they will have very different length distributions and I worry the selection of best edges might get skewed by this. If you finished the runs, feel free to post which turned out better.

ckeeling commented 1 year ago

I found little difference between doing it sequential or in combination in my case. Thus, using the combination allows one to work downstream (e.g. JBAT) on just one assembly rather than step-wise.