marbl / SALSA

SALSA: A tool to scaffold long read assemblies with Hi-C data
MIT License
182 stars 47 forks source link

Hi-C links not loading #175

Closed sanalbert closed 8 months ago

sanalbert commented 1 year ago

Hi, thank you so much for the great tool!

When I run salsa with my contigs file and alignment bam file, this is what I see, and result in the same number of scaffolds as in the original contigs file. Hybrid scaffold graph loaded, nodes = 0 edges = 0 is particularly of concern.

bedfile loaded
Starting Iteration 1
bedfile started
bedfile loaded
Loading Hi-C links 
Hybrid scaffold graph loaded, nodes = 0 edges = 0
Hi-C implied edges = 0
Starting Iteration 2
bedfile started
bedfile loaded
Starting Iteration 2
Loading Hi-C links 
Hybrid scaffold graph loaded, nodes = 0 edges = 0
Hi-C implied edges = 0

Apparently Hi-C links are not loading from my bedfile. But last I checked the mapping went well with pairs well formed. May there be a problem with the bedfile I am using?

Here are the first few lines of my bedfile:

$ head -n 20 alignment_paired.bed 
ptg000008l  12835894    12836333    000000e3-14ec-4e33-81d1-3126eed93ee3:0000:0456/1    42  +
ptg000008l  12835894    12836333    000000e3-14ec-4e33-81d1-3126eed93ee3:0000:0456/1    42  +
ptg000008l  12835894    12836333    000000e3-14ec-4e33-81d1-3126eed93ee3:0000:0456/1    42  +
ptg000086l  45492   46598   000000e3-14ec-4e33-81d1-3126eed93ee3:0456:1562/1    60  +
ptg000086l  45492   46598   000000e3-14ec-4e33-81d1-3126eed93ee3:0456:1562/1    60  +
ptg000086l  45492   46598   000000e3-14ec-4e33-81d1-3126eed93ee3:0456:1562/2    60  +
ptg000008l  8105754 8106113 000000e3-14ec-4e33-81d1-3126eed93ee3:1562:1988/1    60  -
ptg000008l  8105754 8106113 000000e3-14ec-4e33-81d1-3126eed93ee3:1562:1988/2    60  -
ptg000008l  8105754 8106113 000000e3-14ec-4e33-81d1-3126eed93ee3:1562:1988/2    60  -
ptg000009l  2021249 2022123 000000e3-14ec-4e33-81d1-3126eed93ee3:1988:2878/2    60  +
ptg000009l  2021249 2022123 000000e3-14ec-4e33-81d1-3126eed93ee3:1988:2878/2    60  +
ptg000009l  2021249 2022123 000000e3-14ec-4e33-81d1-3126eed93ee3:1988:2878/2    60  +
ptg000407l  14858   15039   00000193-b9e0-4993-8e8c-bb76d5f5bb07:0000:0222/1    0   -
ptg000407l  14858   15039   00000193-b9e0-4993-8e8c-bb76d5f5bb07:0000:0222/1    0   -
ptg000407l  14858   15039   00000193-b9e0-4993-8e8c-bb76d5f5bb07:0000:0222/1    0   -
ptg000407l  14858   15039   00000193-b9e0-4993-8e8c-bb76d5f5bb07:0000:0222/1    0   -
ptg000407l  14858   15039   00000193-b9e0-4993-8e8c-bb76d5f5bb07:0000:0222/1    0   -
ptg000016l  17905603    17905882    00000193-b9e0-4993-8e8c-bb76d5f5bb07:0222:0550/1    60  +
ptg000016l  17905603    17905882    00000193-b9e0-4993-8e8c-bb76d5f5bb07:0222:0550/1    60  +
ptg000016l  17905603    17905882    00000193-b9e0-4993-8e8c-bb76d5f5bb07:0222:0550/1    60  +

Thanks for all your support!

SwenDiepstraten commented 1 year ago

Hi! I am encountering the same issue as described here. Did you happen to find a solution?

Kind regards

sanalbert commented 1 year ago

@SwenDiepstraten Hi, I actually ended up using another tool, YaHS, and maybe the issue at YaHS could help, although the input requirements of SALSA might be different. Could you share if you get a solution for SALSA?

skoren commented 9 months ago

Most likely the issue is in the bed file, if it wasn't generated using the recommended pipeline for SALSA it might have incorrect formatting that the code didn't expect. In the snipet above at least, there are no links to use, all the reads /1 and /2 are in the same tig and mostly only the /1 or /2 for a read is present not both pairs. However, we're not actively maintaining SALSA so using YAHS is a reasonable option.

skoren commented 8 months ago

As I commented in issue #177 the implied links will always be 0 so that alone is not an indication of HiC links being missing. The size of the contig_links_scaled_iteration_* files is. However, I still suspect the bed file here is the issue.