Closed kfletcher88 closed 5 years ago
Subsequent to this I have tried setting the iteration parameter. Using the exact same inputs I set -i 10
and ran SALSA.
This time the output presents 12 iterations and the assembly output is reflective of the 11th. This means that there is evidence for further joins (in this case only 14) which are not being presented in the final assembly.
I will set -i
higher to see if it keeps going beyond this.
Example: Command
$ python run_pipeline.py -a jelly.out.fasta -l jelly.out.fasta.fai -b alignment.bed -e GATC -o scaffolds-i10 -i 10 &> salsa_i10.log
Abbreviated output assembly stats:
$ stats.sh scaffolds-i10/scaffolds_FINAL.fasta
... Main genome scaffold total: 4757
... Max scaffold length: 17.669 MB
Scaffold count and max scaffold length for iterations 12 and 11
$ wc -l scaffolds-i10/scaffold_length_iteration_12 && sort -nrk2,2 scaffolds-i10/scaffold_length_iteration_12 | head -1
4743 scaffolds-i10/scaffold_length_iteration_12
scaffold_4259 17622210
$ wc -l scaffolds-i10/scaffold_length_iteration_11 && sort -nrk2,2 scaffolds-i10/scaffold_length_iteration_11 | head -1
4757 scaffolds-i10/scaffold_length_iteration_11
scaffold_489 17622210
Abbreviated log file again shows iteration 12 as beginning, not sure why it terminated though:
...
python layout_unitigs.py -x abc -l scaffolds-i10/contig_links_scaled_sorted_iteration_9 -c 1000 -i 9 -d scaffolds-i10
break_contigs -a scaffolds-i10/alignment_iteration_10.bed -b scaffolds-i10/breakpoints_iteration_10.txt -l scaffolds-i10/scaffold_length_iteration_10 -i 10 -s 100 > scaffolds-i10/misasm_iteration_10.report
python refactor_breaks.py -d scaffolds-i10 -i 10 > scaffolds-i10/misasm_10.log
python make_links.py -b scaffolds-i10/alignment_iteration_10.bed -d scaffolds-i10 -i 10
python layout_unitigs.py -x abc -l scaffolds-i10/contig_links_scaled_sorted_iteration_10 -c 1000 -i 10 -d scaffolds-i10
break_contigs -a scaffolds-i10/alignment_iteration_11.bed -b scaffolds-i10/breakpoints_iteration_11.txt -l scaffolds-i10/scaffold_length_iteration_11 -i 11 -s 100 > scaffolds-i10/misasm_iteration_11.report
python refactor_breaks.py -d scaffolds-i10 -i 11 > scaffolds-i10/misasm_11.log
python make_links.py -b scaffolds-i10/alignment_iteration_11.bed -d scaffolds-i10 -i 11
python layout_unitigs.py -x abc -l scaffolds-i10/contig_links_scaled_sorted_iteration_11 -c 1000 -i 11 -d scaffolds-i10
break_contigs -a scaffolds-i10/alignment_iteration_12.bed -b scaffolds-i10/breakpoints_iteration_12.txt -l scaffolds-i10/scaffold_length_iteration_12 -i 12 -s 100 > scaffolds-i10/misasm_iteration_12.report
python refactor_breaks.py -d scaffolds-i10 -i 12 > scaffolds-i10/misasm_12.log
Hi,
Although joins can be made during scaffolding may not mean that they are correct. SALSA ties to keep as many high confident joins as possible in the scaffolds. Also, if it runs for 12 iterations, it would output scaffolds at 11th iteration as final scaffolds, because in 12 iterations, most new joins are flagged as wrong.
Hi,
Thanks for this tool.
I have been using the latest version to scaffold an assembly, however I have noticed that the statistics of the output assembly do not equal the statistics of the final iteration reported. Is there are reason for this, or could SALSA have continued into additional iterations?
I notice in previous issues that it has been mentioned that the -i parameter is now overwritten and iterations continue until the data is exhausted (https://github.com/machinegun/SALSA/issues/44#issuecomment-435959312 & https://github.com/machinegun/SALSA/issues/24#issuecomment-377586472), but the example below would suggest that further scaffolding could be performed and that there is evidence for the largest scaffold still increasing by 35%. My command did not specify number of iterations.
Would appreciate any help you can give.
Thanks
Example data:
The original SALSA command was:
The log file shows that iteration 5 began, but maybe didn't complete?: