medvedevgroup / SibeliaZ

A fast whole-genome aligner based on de Bruijn graphs
http://medvedevgroup.com/
Other
140 stars 19 forks source link

IndexError: string index out of range #17

Closed Boer223 closed 4 years ago

Boer223 commented 4 years ago

Hi, When I run the maf_to_gfa1.py script to convert alignment.maf to gfa format, but it occurs the following error:

Traceback (most recent call last):
  File "/home/cuixb/tools/biosoft/SibeliaZ-1.2.1/SibeliaZ-LCB/maf_to_gfa1.py", line 177, in <module>
    blocks, sequence = split_maf_blocks(args.maf)
  File "/home/cuixb/tools/biosoft/SibeliaZ-1.2.1/SibeliaZ-LCB/maf_to_gfa1.py", line 102, in split_maf_blocks
    next_profile = profile(maf, next_column)
  File "/home/cuixb/tools/biosoft/SibeliaZ-1.2.1/SibeliaZ-LCB/maf_to_gfa1.py", line 46, in profile
    return [group[i].body[column] == '-' for i in xrange(len(group))]
IndexError: string index out of range

And part of the alignment.maf file:

##maf version=1
# sibeliaz v.1.2.1
# cmd=-f 64 -t 28 -o westar_kale_chrA01 data/westar.fa.split/westar.id_chrA01.fa data/kale.fa.split/kale.id_kale_chrA01.fa

a
s kale_chrA01 19067038 227 + 40689054 GTTTACAAGTATTAATAGAGAGAGCAACAAGGAAATTCGAAATGGTTAAGCATGTGTAGTCAAAGGACAGGCTGGAACTCCTTTTTGAATCACTTGGCTGTGCTTTCTCACATGC-TT
GCACTAGTAT--AAAGGTAACTTCTCCTTTCCAGCATCATACAGGCTGTC-AAAGTGATCCCTTATCCTTCCTTAACCTCACTTATCCTCTTTGGTCGAGTTTCCTCTCTTCT-
s kale_chrA01 18728550 226 + 40689054 >1_1
s kale_chrA01 21852872 224 - 40689054 GTTTACAAGTATTAATAGAGAGAGCAACAAGGAAATTCGAAATGGTTAAGCATGTGTAGTCAAAGGACAGGCTGGAACTCC-TTTTGAATCACTTGGCTGTGCTTTCTCACATGC-TT
GCACTAGTAT--AAAGGTAACTTCTCCTTTCCAGCATCATACAGGCTGTC-AAAGTGATCCCTTATCCTTCCTTAACCTCACTTATCCTCTTTGGTCGAGTTTCCTCTCTTCT-
s kale_chrA01 21847912 224 - 40689054 >1_2
s kale_chrA01 18894209 224 + 40689054 --TTACAAGTATTAATAGAGAGAGCAACAAGGAAATTCGAAATGGTTAAGCATGTGTAGTCAAAGGACAGGCTGGAACTCC-TTTTGAATCACTTGGCTGTGCTTTCTCACATGC-TT
GCACTAGTAT--AAAGGTAACTTCTCCTTTCCAGCATCATACAGGCTGTC-AAAGTGATCCCTTATCCTTCCTTAACCTCACTTATCCTCTTTGGTCGAGTTTCCTCTCTTCT-
s kale_chrA01 18905069 226 + 40689054 >1_3
s kale_chrA01 18937683 224 + 40689054 --TTACAAGTATTAATAGAGAGAGCAACAAGGAAATTCGAAATGGTTAAGCATGTGTAGTCAAAGGACAGGCTGGAACTCC-TTTTGAATCACTTGGCTGTGCTTTCTCACATGC-TT
GCACTAGTAT--AAAGGTAACTTCTCCTTTCCAGCATCATACAGGCTGTC-AAAGTGATCCCTTATCCTTCCTTAACCTCACTTATCCTCTTTGGTCGAGTTTCCTCTCTTCT-
s kale_chrA01 18942636 226 + 40689054 >1_4
s kale_chrA01 21656164 226 - 40689054 --TTACAAGTATTAATAGAGAGAGCAACAAGGAAATTCGAAATGGTTAAGCATGTGTAGTCAAAGGACAGGCTGGAACTCC-TTTTGAATCACTTGGCTGTGCTTTCTCACATGC-TT
GCACTAGTAT--AAAGGTAACTTCTCCTTTCCAGCATCATACAGGCTGTC-AAAGTGATCCCTTATCCTTCCTTAACCTCACTTATCCTCTTTGGTCGAGTTTCCTCTCTTCT-
s kale_chrA01 19062092 225 + 40689054 >1_5
s kale_chrA01 18723593 225 + 40689054 GTTTACAAGTATTAATAGAGAGAGCAACAAGGAAATTCGAAATGGTTAAGCATGTGTAGTCAAAGGACAGGCTGGAACTCC-TTTTGAATCACTTGGCTGTGCTTTCTCACATGC-TT
GCACTAGTAT--AAAGGTAACTTCTCCTTTCCAGCATCATACAGGCTGTC-AAAGTGATCCCTTATCCTTCCTTAACCTCACTTATCCTCTTTGGTCGAGTTTCCTCTCTTCT-
s kale_chrA01 21620759 224 - 40689054 >1_6
s kale_chrA01 21380478 224 - 40689054 --TTACAAGTATTAATAGAGAGAGCAACAAGGAAATTCGAAATGGTTAAGCATGTGTAGTCAAAGGACATGCTGGAACTCC-TTTTGAATCACTTGGCTGTGCTTTCTCACATGC-TT
GCACTAGTAT--AAAGGTAACTTCTCCTTTCCAGCATCATACAGGCTGTC-AAAGTGATCCCTTATCCTTCCTTAACCTCACTTATCCTCTTTGGTCGAGTTTCCTCTCTTCT-
s kale_chrA01 21368346 224 - 40689054 >1_7
s kale_chrA01 21317989 224 - 40689054 GTTTACAAGTATTAATAGAGAGAGCAACAAGGAAATTCGAAATGGTTAAGCATGTGTAGTCAAAGGACAGGCTGGAACTCC-TTTTGAATCACTTGGCTGTGCTTTCTCACATGC-TT
GCACTAGTAT--AAAGGTAACTTCTCCTTTCCGGCATCATACAGGCTGTC-AAAGTGATCCCTTATCCTTCCTTAACCTCACTTATCCTCTTTGGTCGAGTTTCCTCTCTTCT-
s kale_chrA01 21307298 224 - 40689054 >1_8
s kale_chrA01 19477728 226 + 40689054 GTTTACAAGTATTAATAGAGAGAGCAACAAGGAAATTCGAAATGGGTAAGCATGTGTAGTCAAAGGACAGGCTGGAACTCC-TTTTGAATCACTTGGCTGTGCTTTCTCACATGC-TT
GCACTAGTAT--AAAGGTAACTTCTCCTTTCCAGCATCATACAGGCTGTC-AAAGTGATCCCTTATCCTTCCTTAACCTCCCTTATCCTCTTTGGTCGAGTTTCCTCTCTTCT-
s kale_chrA01 19488373 226 + 40689054 >1_9
s kale_chrA01 19756575 226 + 40689054 --TTACAAGTATTAATAGAGAGAGCAACAAGGAAATTCGAAATGGTTAAGCATGTGTAGTCAAAGGACAGGCTGGAACTCCTTTTTGAATCACTTGGCTGTGCTTTCTCACATGC-TT
GCACTAGTAT--AAAGGTAACTTCTCCTTTCCAGCATCATACAGGCTGTC-AAAGTGATCCCTTATCCTTCCTTAACCTCACTTATCCTCTTTGGTCGAGTTTCCTCTCTTCT-
s kale_chrA01 20912571 212 - 40689054 >1_10
s chrA01 28065206 226 + 46056803 -TTTACAAGTATTAATAGAGAGAGCACCAAGGAAATTCGAAATGGTTAAGCATGTGTAGTCAAAGGACAGGCTGGAACTCC-TTTTGAATCACTTGGCTGTGCTTTCTCACATGC-TTGCACT
AGTAT--AAAGGTAACTTCTCCTTTCCAGCATCATACAGGCTGTC-AAAGTGATCCCTTATCCTTCCTTAACCTCACTTATCCTCTTTGGTCGAGTTTCCTCTCTTCT-
s chrA01 27205390 226 + 46056803 >1_11
s chrA01 27210347 200 + 46056803 --TTACAAGTATTAATAGAGAGAGCAACAAGGAAATTCGAAATGGGTAAGCATGTGTAGTCAAAGGACAGGCTGGAACTCC-TTTTGAATCACTTGGCTGTGCTTTCTCACATGC-TTGCACT
AGTAT--AAAGGTAACTTCTCCTTTCCAGCATCATACAGGCTGTC-AAAGTGATCCCTTATCCTTCCTTAACCTCCCTTATCCTCTTTGGTCGAGTTTCCTCTCTTCT-

So, how to solve this error? Thank you in advance!

iminkin commented 4 years ago

Thanks for reporting it, I will take a look an get back soon.

iminkin commented 4 years ago

@Chipcui ,

The alignment.maf file looks really suspicious. Particularly, lines like these:

s kale_chrA01 18728550 226 + 40689054 >1_1

Could you please provide the FASTA file containing kale_chrA01?

Boer223 commented 4 years ago

@iminkin ,

Here is the kale_chrA01 file: kale.id_kale_chrA01.zip

Thank you!

iminkin commented 4 years ago

@Chipcui

Could you please share the whole input? Unfortunately, I can't reproduce the bug using kale_chrA01 alone.

Boer223 commented 4 years ago

Ok, here is the westar.id_chrA01.fa and alignment.maf file: westar.id_chrA01.zip alignment.zip

Thank you!

iminkin commented 4 years ago

Hi @Chipcui ,

Thank you for providing the input files. Could you please try to rerun SibeliaZ? When I tried to run it on my machine and the files you provided, I got the following results without errors:

https://drive.google.com/file/d/1CTw1MZkBN7muhNeSyfbXFQnGGJoAWENQ/view?usp=sharing

Boer223 commented 4 years ago

Ok, I will try again. Thank you!

Boer223 commented 4 years ago

@iminkin , As you said, I rerun SibeliaZ with the following command:

sibeliaz -t 10 -o westar_kale_chrA01 data/westar.fa.split/westar.id_chrA01.fa data/kale.fa.split/kale.id_kale_chrA01.fa

And when I try to convert alignment.mfa to alignment.gfa using python2 ~/tools/biosoft/SibeliaZ-1.2.1/SibeliaZ-LCB/maf_to_gfa1.py westar_kale_chrA01/alignment.maf data/westar.fa.split/westar.id_chrA01.fa data/kale.fa.split/kale.id_kale_chrA01.fa >westar_kale_chrA01/alignment.gfa, it still occurs the above error:

Traceback (most recent call last):
  File "/home/cuixb/tools/biosoft/SibeliaZ-1.2.1/SibeliaZ-LCB/maf_to_gfa1.py", line 177, in <module>
    blocks, sequence = split_maf_blocks(args.maf)
  File "/home/cuixb/tools/biosoft/SibeliaZ-1.2.1/SibeliaZ-LCB/maf_to_gfa1.py", line 102, in split_maf_blocks
    next_profile = profile(maf, next_column)
  File "/home/cuixb/tools/biosoft/SibeliaZ-1.2.1/SibeliaZ-LCB/maf_to_gfa1.py", line 46, in profile
    return [group[i].body[column] == '-' for i in xrange(len(group))]
IndexError: string index out of range

I do not know why.

Here is the new alignment.maf file: alignment.zip

The other two input files are same as above.

Boer223 commented 4 years ago

Hi, I have reinstall SibeliaZ and rerun it, then it's worked! Maybe there is something wrong with my last installation.

Than you again!

iminkin commented 4 years ago

No problem! I am glad it finally worked. Computers are tricky, sometimes strange things happen :)