marbl / SALSA

SALSA: A tool to scaffold long read assemblies with Hi-C data
MIT License
177 stars 47 forks source link

Why do I still get misassembles thought salsa didn't find any? #156

Open zmz1988 opened 2 years ago

zmz1988 commented 2 years ago

Dear developers,

Thanks a lot for writing this super easy-to-use tool!

I'm new in doing scaffolding with Hi-C data and also in using SALSA. I recently ran salsa with our Omni-C data for scaffolding. Our assembly is already quite good (NG50>15 MB), and we are hoping to confirm/solve some of the centromere and telomere part of the genome by using Hi-C data.

After I ran SALSA with python /miniconda3/envs/salsa/bin/run_pipeline.py -a Sample.fasta -l Sample.fasta.fai -b Sample_mapped.PT.alignment.bed -e DNASE -o scaffolds -m yes, I got empty misaim_2.DONE, misaim_3.DONE and misasm_3.log. So I think no misassembles were found in our assembly? Do I interpret it correctly?

However, in the contact map generated I saw many misassembled signals (mostly in centromere and telomere regions), and also some contigs that could be joined into the chromosomes.

So I would like to know whether I had done something wrong in running SALSA that resulted in unsolved misassembled regions? If not, how can I improve my results?

Thanks a lot in advance!

zmz1988 commented 2 years ago

Contact map here

Screenshot 2022-01-17 at 10 37 17