marbl / SALSA

SALSA: A tool to scaffold long read assemblies with Hi-C data
MIT License
178 stars 47 forks source link

error while loading Hi-C data #76

Closed cmonat closed 5 years ago

cmonat commented 5 years ago

Hi,

For me the error is happening while loading the Hi-C data:

bedfile loaded Starting Iteration 1 python /opt/apps/salsa/SALSA-2.2/fast_scaled_scores.py -d scaffolds -i 1 sort -k 5 -gr scaffolds/contig_links_scaled_iteration_1 > scaffolds/contig_links_scaled_sorted_iteration_1 python /opt/apps/salsa/SALSA-2.2/layout_unitigs.py -x abc -l scaffolds/contig_links_scaled_sorted_iteration_1 -c 1000 -i 1 -d scaffolds Loading Hi-C links Traceback (most recent call last): File "/opt/apps/salsa/SALSA-2.2/layout_unitigs.py", line 929, in <module> generate_scaffold_graph() File "/opt/apps/salsa/SALSA-2.2/layout_unitigs.py", line 392, in generate_scaffold_graph if contig_length[c1] <= int(args.cutoff) or contig_length[c2] <= int(args.cutoff): KeyError: 'ctg4633_2'

Knowing I have launched the SALSA pipeline with the following command line:

python /opt/apps/salsa/SALSA-2.2/run_pipeline.py -a /home/cmonat/WheatOMICS/Renan_rawont_v4.fasta -l /home/cmonat/WheatOMICS/Renan_rawont_v4.fasta.fai -b DEDUPLICATESfiles/SL1810052_merged.bed -e GATC,AATC,ATTC,ACTC,AGTC -o scaffolds -m yes

Do you have any idea what I should do to correct this? Maybe I should not run the 4 ligations sites in the same pipeline? So far I have used the mapping_pipeline offered by Arima genomics (https://github.com/ArimaGenomics/mapping_pipeline) to map the Hi-C data to my sequences.

I hope everything is clear enough and that I provided all the information you need to help me. Thank you very much in advance. Cheers

C.

skoren commented 5 years ago

This may be due to your fasta header containing ":", see #20 and rename the fasta headers if you do have ":" in them.