jhawkey / IS_mapper

IS mapping software
Other
50 stars 16 forks source link

IS_mapper failing at create_typing_out.py: list index out of range #27

Closed AxelJa closed 6 years ago

AxelJa commented 7 years ago

Hi,

I'm running IS mapper on 1 IS element, but the program runs into a error "List index out of range" (last part of on screen output is below). I've already adapted my gbk file to only have the chromosome, since it ran into the problem of multiple sequences in that file. Otherwise, I cannot see what I'm doing wrong.

Any help would be very welcome

Thanks Axel

Building a new DB, current time: 08/17/2017 10:52:11 New DB name: /hpc/dla_mm/ajanssen3/population_sequencing/KP040/results/ISmapper/day1_IS5_temp/IS5.fasta New DB title: /hpc/dla_mm/ajanssen3/population_sequencing/KP040/results/ISmapper/day1_IS5_temp/IS5.fasta Sequence type: Nucleotide Keep MBits: T Maximum file size: 1000000000B Adding sequences from FASTA; added 1 sequences in 0.000299931 seconds. /hpc/dla_mm/ajanssen3/population_sequencing/KP040/results/ISmapper/day1_IS5_temp/KP040_min500_cov10.short.fasta.bwt [bwa_index] Pack FASTA... 0.07 sec [bwa_index] Construct BWT for the packed sequence... [bwa_index] 1.74 seconds elapse. [bwa_index] Update BWT... 0.04 sec [bwa_index] Pack forward-only FASTA... 0.05 sec [bwa_index] Construct SA from BWT and Occ... 0.61 sec [main] Version: 0.7.15-r1140 [main] CMD: bwa index /hpc/dla_mm/ajanssen3/population_sequencing/KP040/results/ISmapper/day1_IS5_temp/KP040_min500_cov10.short.fasta [main] Real time: 2.744 sec; CPU: 2.521 sec [M::bwa_idx_load_from_disk] read 0 ALT contigs [M::process] read 314 sequences (39732 bp)... [M::mem_process_seqs] Processed 314 reads in 0.031 CPU sec, 0.006 real sec [main] Version: 0.7.15-r1140 [main] CMD: bwa mem -t 5 /hpc/dla_mm/ajanssen3/population_sequencing/KP040/results/ISmapper/day1_IS5_temp/KP040_min500_cov10.short.fasta /hpc/dla_mm/ajanssen3/population_sequencing/KP040/results/ISmapper/day1_IS5_temp/day1_IS5_LeftFinal.fastq [main] Real time: 0.063 sec; CPU: 0.042 sec [M::bwa_idx_load_from_disk] read 0 ALT contigs [M::process] read 445 sequences (59947 bp)... [M::mem_process_seqs] Processed 445 reads in 0.038 CPU sec, 0.008 real sec [main] Version: 0.7.15-r1140 [main] CMD: bwa mem -t 5 /hpc/dla_mm/ajanssen3/population_sequencing/KP040/results/ISmapper/day1_IS5_temp/KP040_min500_cov10.short.fasta /hpc/dla_mm/ajanssen3/population_sequencing/KP040/results/ISmapper/day1_IS5_temp/day1_IS5_RightFinal.fastq [main] Real time: 0.021 sec; CPU: 0.049 sec [samopen] SAM header is present: 1 sequences. [samopen] SAM header is present: 1 sequences. Traceback (most recent call last): File "/hpc/local/CentOS7/dla_mm/tools/miniconda2/bin/create_typing_out.py", line 543, in main() File "/hpc/local/CentOS7/dla_mm/tools/miniconda2/bin/create_typing_out.py", line 368, in main add_known(x_L, x_R, y_L, y_R, info[6], genbank, args.ref, args.seq, args.temp, args.cds, args.trna, args.rrna, region, feature_count, results, genbank.features, feature_list, removed_results, line, 'closest.bed') File "/hpc/local/CentOS7/dla_mm/tools/miniconda2/bin/create_typing_out.py", line 209, in addknown results['region' + str(region)] = [orient, str(start), str(end), gap, call, str(seq_results[0]), str('%.2f' % seq_results[1]), gene_left[-1][:-1], gene_left[-1][-1], gene_left[1], gene_right[-1][:-1], gene_right[-1][-1], gene_right[1], func_pred] IndexError: list index out of range Traceback (most recent call last): File "/hpc/local/CentOS7/dla_mm/tools/miniconda2/bin/ismap", line 11, in load_entry_point('ISMapper==0.1.5.1', 'console_scripts', 'ismap')() File "/hpc/local/CentOS7/dla_mm/tools/miniconda2/lib/python2.7/site-packages/ismap/ismap.py", line 743, in main '--max_range', args.max_range, '--output', currentdir + sample + '' + query_name, '--igv', igv_flag, '--chr_name', args.chr_name], shell=True) File "/hpc/local/CentOS7/dla_mm/tools/miniconda2/lib/python2.7/site-packages/ismap/ismap.py", line 150, in run_command raise CommandError({"message": message}) ismap.ismap.CommandError: {'message': "Command 'create_typing_out.py --intersect /hpc/dla_mm/ajanssen3/population_sequencing/KP040/results/ISmapper/day1_KP040_min500_cov10.short_IS5_intersect.bed --closest /hpc/dla_mm/ajanssen3/population_sequencing/KP040/results/ISmapper/day1_KP040_min500_cov10.short_IS5_closest.bed --left_bed /hpc/dla_mm/ajanssen3/population_sequencing/KP040/results/ISmapper/day1_left_KP040_min500_cov10.short_IS5_merged.sorted.bed --right_bed /hpc/dla_mm/ajanssen3/population_sequencing/KP040/results/ISmapper/day1_right_KP040_min500_cov10.short_IS5_merged.sorted.bed --left_unpaired /hpc/dla_mm/ajanssen3/population_sequencing/KP040/results/ISmapper/day1_KP040_min500_cov10.short_IS5_left_unpaired.bed --right_unpaired /hpc/dla_mm/ajanssen3/population_sequencing/KP040/results/ISmapper/day1_KP040_min500_cov10.short_IS5_right_unpaired.bed --seq /hpc/dla_mm/ajanssen3/population_sequencing/KP040/results/ISmapper/day1_IS5_temp/IS5.fasta --ref /hpc/dla_mm/ajanssen3/klebs_illumina_minion/prokka/min500_cov10/KP040/KP040_min500_cov10.short.gbk --temp /hpc/dla_mm/ajanssen3/population_sequencing/KP040/results/ISmapper/day1_IS5_temp/ --cds locus_tag gene product --trna locus_tag product --rrna locus_tag product --min_range 0.2 --max_range 1.1 --output /hpc/dla_mm/ajanssen3/population_sequencing/KP040/results/ISmapper/day1_IS5 --igv 0 --chr_name not_specified' failed with non-zero exit status: 1"}

jhawkey commented 7 years ago

Are you able to send me the files required for this command? I can then run the command myself and attempt to debug. The files I'll need are:

You should be able to attach them here, in the issue.

AxelJa commented 7 years ago

I reran the program every since posting the error, so here is the new error:

Building a new DB, current time: 08/29/2017 09:48:13 New DB name: /hpc/dla_mm/ajanssen3/population_sequencing/KP040/results/ISmapper/day1_IS5_KP040_ColS_temp/IS5_KP040_ColS.fasta New DB title: /hpc/dla_mm/ajanssen3/population_sequencing/KP040/results/ISmapper/day1_IS5_KP040_ColS_temp/IS5_KP040_ColS.fasta Sequence type: Nucleotide Keep MBits: T Maximum file size: 1000000000B Adding sequences from FASTA; added 1 sequences in 0.0176148 seconds. /hpc/dla_mm/ajanssen3/population_sequencing/KP040/results/ISmapper/day1_IS5_KP040_ColS_temp/KP040_min500_cov10.short.fasta.bwt [bwa_index] Pack FASTA... 0.05 sec [bwa_index] Construct BWT for the packed sequence... [bwa_index] 1.72 seconds elapse. [bwa_index] Update BWT... 0.04 sec [bwa_index] Pack forward-only FASTA... 0.03 sec [bwa_index] Construct SA from BWT and Occ... 1.08 sec [main] Version: 0.7.15-r1140 [main] CMD: bwa index /hpc/dla_mm/ajanssen3/population_sequencing/KP040/results/ISmapper/day1_IS5_KP040_ColS_temp/KP040_min500_cov10.short.fasta [main] Real time: 3.069 sec; CPU: 2.936 sec [M::bwa_idx_load_from_disk] read 0 ALT contigs [M::process] read 314 sequences (39732 bp)... [M::mem_process_seqs] Processed 314 reads in 0.029 CPU sec, 0.014 real sec [main] Version: 0.7.15-r1140 [main] CMD: bwa mem -t 2 /hpc/dla_mm/ajanssen3/population_sequencing/KP040/results/ISmapper/day1_IS5_KP040_ColS_temp/KP040_min500_cov10.short.fasta /hpc/dla_mm/ajanssen3/population_sequencing/KP040/results/ISmapper/day1_IS5_KP040_ColS_temp/day1_IS5_KP040_ColS_LeftFinal.fastq [main] Real time: 0.030 sec; CPU: 0.038 sec [M::bwa_idx_load_from_disk] read 0 ALT contigs [M::process] read 443 sequences (59447 bp)... [M::mem_process_seqs] Processed 443 reads in 0.036 CPU sec, 0.018 real sec [main] Version: 0.7.15-r1140 [main] CMD: bwa mem -t 2 /hpc/dla_mm/ajanssen3/population_sequencing/KP040/results/ISmapper/day1_IS5_KP040_ColS_temp/KP040_min500_cov10.short.fasta /hpc/dla_mm/ajanssen3/population_sequencing/KP040/results/ISmapper/day1_IS5_KP040_ColS_temp/day1_IS5_KP040_ColS_RightFinal.fastq [main] Real time: 0.029 sec; CPU: 0.044 sec [samopen] SAM header is present: 1 sequences. [samopen] SAM header is present: 1 sequences. Traceback (most recent call last): File "/hpc/local/CentOS7/dla_mm/tools/miniconda2/bin/create_typing_out.py", line 543, in main() File "/hpc/local/CentOS7/dla_mm/tools/miniconda2/bin/create_typing_out.py", line 368, in main add_known(x_L, x_R, y_L, y_R, info[6], genbank, args.ref, args.seq, args.temp, args.cds, args.trna, args.rrna, region, feature_count, results, genbank.features, feature_list, removed_results, line, 'closest.bed') File "/hpc/local/CentOS7/dla_mm/tools/miniconda2/bin/create_typing_out.py", line 209, in addknown results['region' + str(region)] = [orient, str(start), str(end), gap, call, str(seq_results[0]), str('%.2f' % seq_results[1]), gene_left[-1][:-1], gene_left[-1][-1], gene_left[1], gene_right[-1][:-1], gene_right[-1][-1], gene_right[1], func_pred] IndexError: list index out of range Traceback (most recent call last): File "/hpc/local/CentOS7/dla_mm/tools/miniconda2/bin/ismap", line 11, in load_entry_point('ISMapper==0.1.5.1', 'console_scripts', 'ismap')() File "/hpc/local/CentOS7/dla_mm/tools/miniconda2/lib/python2.7/site-packages/ismap/ismap.py", line 743, in main '--max_range', args.max_range, '--output', currentdir + sample + '' + query_name, '--igv', igv_flag, '--chr_name', args File "/hpc/local/CentOS7/dla_mm/tools/miniconda2/lib/python2.7/site-packages/ismap/ismap.py", line 150, in run_command raise CommandError({"message": message}) ismap.ismap.CommandError: {'message': "Command 'create_typing_out.py --intersect /hpc/dla_mm/ajanssen3/population_sequencing/ 3/population_sequencing/KP040/results/ISmapper/day1_KP040_min500_cov10.short_IS5_KP040_ColS_closest.bed --left_bed /hpc/dla_m d.sorted.bed --right_bed /hpc/dla_mm/ajanssen3/population_sequencing/KP040/results/ISmapper/day1_right_KP040_min500_cov10.sho mapper/day1_KP040_min500_cov10.short_IS5_KP040_ColS_left_unpaired.bed --right_unpaired /hpc/dla_mm/ajanssen3/population_seque janssen3/population_sequencing/KP040/results/ISmapper/day1_IS5_KP040_ColS_temp/IS5_KP040_ColS.fasta --ref /hpc/dla_mm/ajansse ulation_sequencing/KP040/results/ISmapper/day1_IS5_KP040_ColS_temp/ --cds locus_tag gene product --trna locus_tag product --r /results/ISmapper/day1_IS5_KP040_ColS --igv 0 --chr_name not_specified' failed with non-zero exit status: 1"}

I would like to attach the files here, but Github doesn't allow me to do it directly, instead I took the files and put ".TXT" behind it, so it does upload. You'd have to remove it again. (Github only supports PNG, GIF, JPG, DOCX, PPTX, XLSX, TXT, PDF OR ZIP)

My _insertect.bed file is empty (0 bytes), so I didn't attach it here (named day1_KP040_min500_cov10.short_IS5_KP040_ColS_intersect.bed)

day1_right_KP040_min500_cov10.short_IS5_KP040_ColS_merged.sorted.bed.TXT IS5_KP040.fasta.TXT KP040_min500_cov10.short.gbk.TXT day1_KP040_min500_cov10.short_IS5_KP040_ColS_closest.bed.TXT day1_KP040_min500_cov10.short_IS5_KP040_ColS_left_unpaired.bed.TXT [Uploading day1_KP040_min500_cov10.short_IS5_KP040_ColS_right_unpaired.bed.TXT…]() day1_left_KP040_min500_cov10.short_IS5_KP040_ColS_merged.sorted.bed.TXT

jhawkey commented 7 years ago

Hi @AxelJa,

I'm looking into this now and there seems to be something strange going on with your Genbank file. The version I've downloaded from here is full of excess double quotation marks!

"                     /locus_tag=""PROKKA_00001"""
     CDS             58..2328
"                     /locus_tag=""PROKKA_00001"""
"                     /inference=""ab initio prediction:Prodigal:2.6"""
                     /codon_start=1
                     /transl_table=11
"                     /product=""hypothetical protein"""
"                     /protein_id=""Prokka:PROKKA_00001"""

Is this what yours looks like as well?

When I removed the excess quotation marks, ISMapper was able to run.

I used the following bash command to remove the excess quotation marks: sed -e 's/""/"/g' -e 's/^"//' -e 's/".$//' KP040_min500_cov10.short.gbk > KP040_min500_cov10.short.BETTER.gbk