schneebergerlab / syri

Synteny and Rearrangement Identifier
https://schneebergerlab.github.io/syri/
MIT License
303 stars 36 forks source link

chroder input format requirements #224

Closed estolle closed 7 months ago

estolle commented 7 months ago

Hi,

I am attempting to use syri for some synteny analyses between new (draft) genomes of bees (same species).

I ran into the common problem of unequal chromosome numbers etc when using syri directly and hence was trying using chroder first. I am getting some error tho, about the fasta headers I presume. I am a bit lost at the moment.

nucmer --maxmatch -c 100 -b 500 -l 50 -t $CPUs -p $INPUTDIR/$PREFIX.nucmer1 $QUERY $REFERENCE delta-filter -m -i 90 -l 100 $INPUTDIR/$PREFIX.nucmer1.delta > $INPUTDIR/$PREFIX.nucmer1.filtered.delta show-coords -THrd $INPUTDIR/$PREFIX.nucmer1.filtered.delta > $INPUTDIR/$PREFIX.nucmer1.filtered.coords COORDS="$INPUTDIR/$PREFIX.nucmer1.filtered.coords"

chroder -n 500 -o $PREFIX -noref -F T $COORDS $REFERENCE $QUERY

Any recommendation how to run it properly and avoid this error? Thanks

Traceback (most recent call last): File "/scratch/progz/conda_envs/syri/bin/chroder", line 6, in main(sys.argv[1:]) File "/scratch/progz/conda_envs/syri/lib/python3.9/site-packages/syri-1.6.5-py3.9-linux-x86_64.egg/syri/scripts/chroder.py", line 993, in main scaf(args) File "/scratch/progz/conda_envs/syri/lib/python3.9/site-packages/syri-1.6.5-py3.9-linux-x86_64.egg/syri/scripts/chroder.py", line 538, in scaf for r in range(0, refsize[i], 10000): KeyError: 'refcontig_104'

head $INPUTDIR/$PREFIX.nucmer1.filtered.coords 1 4045 10051 5982 4045 4070 98.55 1 -1 contig_104 contig_100 4083 8198 5515 9652 4116 4138 98.58 1 1 contig_104 contig_100 4083 8198 5313 9449 4116 4137 98.55 1 1 contig_104 contig_66 3 2583 70673 73261 2581 2589 99.04 1 1 contig_144 contig_221 97 5499 11534 6111 5403 5424 98.97 1 -1 contig_144 contig_66 1 2755 7878 10589 2755 2712 92.78 1 1 contig_162 contig_66 2921 5700 40050 37257 2780 2794 97.37 1 -1 contig_162 contig_197 1 5766 7966 13753 5766 5788 99.02 1 1 contig_2 contig_100

cat $INPUTDIR/$PREFIX.nucmer1.filtered.coords | grep "contig_104" 1 4045 10051 5982 4045 4070 98.55 1 -1 contig_104 contig_100 4083 8198 5515 9652 4116 4138 98.58 1 1 contig_104 contig_100 4083 8198 5313 9449 4116 4137 98.55 1 1 contig_104 contig_66

mnshgl0110 commented 7 months ago

For nucmer, the first genome should be reference and the second query.

estolle commented 7 months ago

thanks,

yes that was the problem. It ran successfully now.