katholt / srst2

Short Read Sequence Typing for Bacterial Pathogens
Other
123 stars 65 forks source link

IndexError: list index out of range #87

Closed bikongxingchen0811 closed 5 years ago

bikongxingchen0811 commented 7 years ago

Hello,sir/Madam, An error appeared when i used srst2 for MLST after the command line as follows, srst2 --input_pe LSM180_1.fastq LSM180_2.fastq --output LSM180_test --log --mlst_db Streptococcus_suis.fasta --mlst_definitions ssuis.txt --gene_db arg-annot-nt-v3-march2017_doc.fasta --prev_output Suis__compiledResults.txt

The error information as follows, Traceback (most recent call last): File "/opt/sysoft/Python-2.7.11/bin/srst2", line 9, in load_entry_point('srst2==0.2.0', 'console_scripts', 'srst2')() File "/opt/sysoft/Python-2.7.11/lib/python2.7/site-packages/srst2/srst2.py", line 1717, in main mlst_report, mlst_results = run_srst2(args, fileSets, args.mlst_db, "mlst") File "/opt/sysoft/Python-2.7.11/lib/python2.7/site-packages/srst2/srst2.py", line 1264, in run_srst2 db_results_list, fasta) File "/opt/sysoft/Python-2.7.11/lib/python2.7/site-packages/srst2/srst2.py", line 1327, in process_fasta_db results,gene_list, db_report, cluster_symbols, max_mismatch) File "/opt/sysoft/Python-2.7.11/lib/python2.7/site-packages/srst2/srst2.py", line 1429, in map_fileSet_to_db size_allele, next_to_del_depth_allele, run_type,unique_gene_symbols, unique_allele_symbols) File "/opt/sysoft/Python-2.7.11/lib/python2.7/site-packages/srst2/srst2.py", line 510, in score_alleles unique_gene_symbols,unique_allele_symbols) File "/opt/sysoft/Python-2.7.11/lib/python2.7/site-packages/srst2/srst2.py", line 931, in group_allele_dict_by_gene gene_name = get_allele_name_from_db(allele,run_type,args,unique_allele_symbols,unique_cluster_symbols)[component_ind] File "/opt/sysoft/Python-2.7.11/lib/python2.7/site-packages/srst2/srst2.py", line 900, in get_allele_name_from_db allele_name = gene_name[1] IndexError: list index out of range

The version of dependencies are as follows, samtools 1.3.1 bowtie2-2.2.9 Python 2.7.11

I can not find the reasons for this error according to the methods online, please help me, thank you very much!

Best Regards Star

keyfm commented 5 years ago

Hi Star, I have seemingly the same error. Preceded by many warnings reading like that:

Warning: gene yqiL_685 in database file isn't among the columns in the ST definitions: arcC,aroE,glpF,gmk,pta,tpi,yqiL,clonal_complex Any sequences with this gene identifer from the database will not be included in typing. Warning: gene arcC in ST definitions file isn't among those in the database arcC_1,arcC_2,arcC_3,arcC_4,arcC_5,arcC_6,arcC_7,arcC_8,arcC_9,arcC_10,arcC_11,arcC_12,arcC_13,arcC_14,arcC_15,arcC_16,arcC_17,arcC_18,arcC_19,arcC_20,arcC_21,arc This will result in all STs being called as unknown (but allele calls will be accurate for other loci).

I am trying to get the STs for Staph aureus.

Did you or anybody solve the problem?

Thx

katholt commented 5 years ago

If the delimiter used in the MLST database is anything other than '-' (the most common and thus default), you need to specify this in your call to srst2, as outlined in the documentation:

--mlst_delimiter MLST_DELIMITER Character(s) separating gene name from allele number in MLST database (default "-", as in arcc-1)

ramadatta commented 3 years ago

I had the issue this morning, especially when the gene in the database has two "_", srst2 cannot assign the ST information.

For example, i would get the following error for :

....
Warning: gene Pas in database file isn't among the columns in the ST definitions: Pas_cpn60
 Any sequences with this gene identifer from the database will not be included in typing.
Warning: gene Pas_cpn60 in ST definitions file isn't among those in the database Pas
 This will result in all STs being called as unknown (but allele calls will be accurate for other loci).
....

This is because Pas_cpn601 has two "" and SRST2 --mlstdelimiter needs the character separating gene name from allele number in MLST database (default "-", as in arcc-1). In such a case, it gets confused with double "".

I solved it by modifying the gene name the following way:

sed -i 's/Pas_/Pas-/g' Acinetobacter_baumannii#2.fasta
sed -i 's/Pas_/Pas-/g' profiles_csv

This now assigns, ST information successfully with the following message during the runtime.

Attempting to read 7 loci from ST database profiles_csv
Read ST database profiles_csv successfully