phac-nml / staramr

Scans genome contigs against the ResFinder, PlasmidFinder, and PointFinder databases.
Apache License 2.0
113 stars 26 forks source link

Unexpected crashes on some FASTA files #23

Closed kbessonov1984 closed 5 years ago

kbessonov1984 commented 5 years ago

Hello, having issues with the version 0.2.1. One of my fasta files crashes staramr at the parse results stage. The input assembly file can be located here

(mob_suite) kirill@Discovery20:~/Desktop$ staramr --verbose search   --nprocs 2 --pid-threshold 98.0 --percent-length-overlap-resfinder 60.0 --percent-length-overlap-pointfinder 95.0  --output-summary dataset_588.dat --output-resfinder dataset_589.dat --output-settings dataset_590.dat --output-excel dataset_591.dat.xlsx --output-hits-dir staramr_hits  "N18.fasta"
2018-08-10 10:29:01,343 INFO Search.run,292: No --pointfinder-organism specified. Will not search the PointFinder databases
2018-08-10 10:29:01,343 INFO Search.run,322: --output-dir not set. Files will be output to the respective --output-[type] setting
2018-08-10 10:29:01,344 DEBUG Search.run,337: Found --output-hits-dir [staramr_hits] and is a directory. Will write hits here
2018-08-10 10:29:01,429 DEBUG BlastHandler.run_blasts,90: Resfinder Databases: ['colistin', 'tetracycline', 'quinolone', 'fusidicacid', 'glycopeptide', 'rifampicin', 'trimethoprim', 'beta-lactam', 'aminoglycoside', 'oxazolidinone', 'macrolide', 'phenicol', 'fosfomycin', 'sulphonamide', 'nitroimidazole']
2018-08-10 10:29:01,430 INFO BlastHandler._make_db_from_input_files,108: Making BLAST databases for input files
2018-08-10 10:29:01,430 DEBUG BlastHandler._make_db_from_input_files,114: Creating symlink from [N18.fasta] to [/var/folders/b7/frgczw4n53xd4nlczrjd3jwc0000gq/T/tmpp1hv_jse/input-genomes/N18.fasta]
2018-08-10 10:29:01,431 DEBUG BlastHandler._make_blast_db,200: makeblastdb -in /var/folders/b7/frgczw4n53xd4nlczrjd3jwc0000gq/T/tmpp1hv_jse/input-genomes/N18.fasta -dbtype nucl -parse_seqids
2018-08-10 10:29:01,659 DEBUG BlastHandler.run_blasts,99: Done making blast databases for input files
2018-08-10 10:29:01,660 INFO BlastHandler.run_blasts,102: Scheduling blast for N18.fasta
2018-08-10 10:29:01,663 DEBUG BlastHandler._launch_blast,193: blastn -out /var/folders/b7/frgczw4n53xd4nlczrjd3jwc0000gq/T/tmpp1hv_jse/N18.fasta.colistin.resfinder.blast.xml -outfmt "6 qseqid sseqid pident length qstart qend sstart send slen qlen sstrand sseq qseq" -query /Users/kirill/miniconda/envs/mob_suite/lib/python3.6/site-packages/staramr/databases/data/dist/resfinder/colistin.fsa -db /var/folders/b7/frgczw4n53xd4nlczrjd3jwc0000gq/T/tmpp1hv_jse/input-genomes/N18.fasta -evalue 0.001
2018-08-10 10:29:01,670 DEBUG BlastHandler._launch_blast,193: blastn -out /var/folders/b7/frgczw4n53xd4nlczrjd3jwc0000gq/T/tmpp1hv_jse/N18.fasta.tetracycline.resfinder.blast.xml -outfmt "6 qseqid sseqid pident length qstart qend sstart send slen qlen sstrand sseq qseq" -query /Users/kirill/miniconda/envs/mob_suite/lib/python3.6/site-packages/staramr/databases/data/dist/resfinder/tetracycline.fsa -db /var/folders/b7/frgczw4n53xd4nlczrjd3jwc0000gq/T/tmpp1hv_jse/input-genomes/N18.fasta -evalue 0.001
2018-08-10 10:29:01,960 DEBUG BlastHandler._launch_blast,193: blastn -out /var/folders/b7/frgczw4n53xd4nlczrjd3jwc0000gq/T/tmpp1hv_jse/N18.fasta.quinolone.resfinder.blast.xml -outfmt "6 qseqid sseqid pident length qstart qend sstart send slen qlen sstrand sseq qseq" -query /Users/kirill/miniconda/envs/mob_suite/lib/python3.6/site-packages/staramr/databases/data/dist/resfinder/quinolone.fsa -db /var/folders/b7/frgczw4n53xd4nlczrjd3jwc0000gq/T/tmpp1hv_jse/input-genomes/N18.fasta -evalue 0.001
2018-08-10 10:29:02,129 DEBUG BlastHandler._launch_blast,193: blastn -out /var/folders/b7/frgczw4n53xd4nlczrjd3jwc0000gq/T/tmpp1hv_jse/N18.fasta.fusidicacid.resfinder.blast.xml -outfmt "6 qseqid sseqid pident length qstart qend sstart send slen qlen sstrand sseq qseq" -query /Users/kirill/miniconda/envs/mob_suite/lib/python3.6/site-packages/staramr/databases/data/dist/resfinder/fusidicacid.fsa -db /var/folders/b7/frgczw4n53xd4nlczrjd3jwc0000gq/T/tmpp1hv_jse/input-genomes/N18.fasta -evalue 0.001
2018-08-10 10:29:02,154 DEBUG BlastHandler._launch_blast,193: blastn -out /var/folders/b7/frgczw4n53xd4nlczrjd3jwc0000gq/T/tmpp1hv_jse/N18.fasta.glycopeptide.resfinder.blast.xml -outfmt "6 qseqid sseqid pident length qstart qend sstart send slen qlen sstrand sseq qseq" -query /Users/kirill/miniconda/envs/mob_suite/lib/python3.6/site-packages/staramr/databases/data/dist/resfinder/glycopeptide.fsa -db /var/folders/b7/frgczw4n53xd4nlczrjd3jwc0000gq/T/tmpp1hv_jse/input-genomes/N18.fasta -evalue 0.001
2018-08-10 10:29:02,170 DEBUG BlastHandler._launch_blast,193: blastn -out /var/folders/b7/frgczw4n53xd4nlczrjd3jwc0000gq/T/tmpp1hv_jse/N18.fasta.rifampicin.resfinder.blast.xml -outfmt "6 qseqid sseqid pident length qstart qend sstart send slen qlen sstrand sseq qseq" -query /Users/kirill/miniconda/envs/mob_suite/lib/python3.6/site-packages/staramr/databases/data/dist/resfinder/rifampicin.fsa -db /var/folders/b7/frgczw4n53xd4nlczrjd3jwc0000gq/T/tmpp1hv_jse/input-genomes/N18.fasta -evalue 0.001
2018-08-10 10:29:02,212 DEBUG BlastHandler._launch_blast,193: blastn -out /var/folders/b7/frgczw4n53xd4nlczrjd3jwc0000gq/T/tmpp1hv_jse/N18.fasta.trimethoprim.resfinder.blast.xml -outfmt "6 qseqid sseqid pident length qstart qend sstart send slen qlen sstrand sseq qseq" -query /Users/kirill/miniconda/envs/mob_suite/lib/python3.6/site-packages/staramr/databases/data/dist/resfinder/trimethoprim.fsa -db /var/folders/b7/frgczw4n53xd4nlczrjd3jwc0000gq/T/tmpp1hv_jse/input-genomes/N18.fasta -evalue 0.001
2018-08-10 10:29:02,259 DEBUG BlastHandler._launch_blast,193: blastn -out /var/folders/b7/frgczw4n53xd4nlczrjd3jwc0000gq/T/tmpp1hv_jse/N18.fasta.beta-lactam.resfinder.blast.xml -outfmt "6 qseqid sseqid pident length qstart qend sstart send slen qlen sstrand sseq qseq" -query /Users/kirill/miniconda/envs/mob_suite/lib/python3.6/site-packages/staramr/databases/data/dist/resfinder/beta-lactam.fsa -db /var/folders/b7/frgczw4n53xd4nlczrjd3jwc0000gq/T/tmpp1hv_jse/input-genomes/N18.fasta -evalue 0.001
2018-08-10 10:29:02,290 DEBUG BlastHandler._launch_blast,193: blastn -out /var/folders/b7/frgczw4n53xd4nlczrjd3jwc0000gq/T/tmpp1hv_jse/N18.fasta.aminoglycoside.resfinder.blast.xml -outfmt "6 qseqid sseqid pident length qstart qend sstart send slen qlen sstrand sseq qseq" -query /Users/kirill/miniconda/envs/mob_suite/lib/python3.6/site-packages/staramr/databases/data/dist/resfinder/aminoglycoside.fsa -db /var/folders/b7/frgczw4n53xd4nlczrjd3jwc0000gq/T/tmpp1hv_jse/input-genomes/N18.fasta -evalue 0.001
2018-08-10 10:29:02,505 DEBUG BlastHandler._launch_blast,193: blastn -out /var/folders/b7/frgczw4n53xd4nlczrjd3jwc0000gq/T/tmpp1hv_jse/N18.fasta.oxazolidinone.resfinder.blast.xml -outfmt "6 qseqid sseqid pident length qstart qend sstart send slen qlen sstrand sseq qseq" -query /Users/kirill/miniconda/envs/mob_suite/lib/python3.6/site-packages/staramr/databases/data/dist/resfinder/oxazolidinone.fsa -db /var/folders/b7/frgczw4n53xd4nlczrjd3jwc0000gq/T/tmpp1hv_jse/input-genomes/N18.fasta -evalue 0.001
2018-08-10 10:29:02,573 DEBUG BlastHandler._launch_blast,193: blastn -out /var/folders/b7/frgczw4n53xd4nlczrjd3jwc0000gq/T/tmpp1hv_jse/N18.fasta.macrolide.resfinder.blast.xml -outfmt "6 qseqid sseqid pident length qstart qend sstart send slen qlen sstrand sseq qseq" -query /Users/kirill/miniconda/envs/mob_suite/lib/python3.6/site-packages/staramr/databases/data/dist/resfinder/macrolide.fsa -db /var/folders/b7/frgczw4n53xd4nlczrjd3jwc0000gq/T/tmpp1hv_jse/input-genomes/N18.fasta -evalue 0.001
2018-08-10 10:29:02,730 DEBUG BlastHandler._launch_blast,193: blastn -out /var/folders/b7/frgczw4n53xd4nlczrjd3jwc0000gq/T/tmpp1hv_jse/N18.fasta.phenicol.resfinder.blast.xml -outfmt "6 qseqid sseqid pident length qstart qend sstart send slen qlen sstrand sseq qseq" -query /Users/kirill/miniconda/envs/mob_suite/lib/python3.6/site-packages/staramr/databases/data/dist/resfinder/phenicol.fsa -db /var/folders/b7/frgczw4n53xd4nlczrjd3jwc0000gq/T/tmpp1hv_jse/input-genomes/N18.fasta -evalue 0.001
2018-08-10 10:29:02,818 DEBUG BlastHandler._launch_blast,193: blastn -out /var/folders/b7/frgczw4n53xd4nlczrjd3jwc0000gq/T/tmpp1hv_jse/N18.fasta.fosfomycin.resfinder.blast.xml -outfmt "6 qseqid sseqid pident length qstart qend sstart send slen qlen sstrand sseq qseq" -query /Users/kirill/miniconda/envs/mob_suite/lib/python3.6/site-packages/staramr/databases/data/dist/resfinder/fosfomycin.fsa -db /var/folders/b7/frgczw4n53xd4nlczrjd3jwc0000gq/T/tmpp1hv_jse/input-genomes/N18.fasta -evalue 0.001
2018-08-10 10:29:02,853 DEBUG BlastHandler._launch_blast,193: blastn -out /var/folders/b7/frgczw4n53xd4nlczrjd3jwc0000gq/T/tmpp1hv_jse/N18.fasta.sulphonamide.resfinder.blast.xml -outfmt "6 qseqid sseqid pident length qstart qend sstart send slen qlen sstrand sseq qseq" -query /Users/kirill/miniconda/envs/mob_suite/lib/python3.6/site-packages/staramr/databases/data/dist/resfinder/sulphonamide.fsa -db /var/folders/b7/frgczw4n53xd4nlczrjd3jwc0000gq/T/tmpp1hv_jse/input-genomes/N18.fasta -evalue 0.001
2018-08-10 10:29:02,889 DEBUG BlastHandler._launch_blast,193: blastn -out /var/folders/b7/frgczw4n53xd4nlczrjd3jwc0000gq/T/tmpp1hv_jse/N18.fasta.nitroimidazole.resfinder.blast.xml -outfmt "6 qseqid sseqid pident length qstart qend sstart send slen qlen sstrand sseq qseq" -query /Users/kirill/miniconda/envs/mob_suite/lib/python3.6/site-packages/staramr/databases/data/dist/resfinder/nitroimidazole.fsa -db /var/folders/b7/frgczw4n53xd4nlczrjd3jwc0000gq/T/tmpp1hv_jse/input-genomes/N18.fasta -evalue 0.001
2018-08-10 10:29:02,950 DEBUG BlastResultsParser.parse_results,58: /var/folders/b7/frgczw4n53xd4nlczrjd3jwc0000gq/T/tmpp1hv_jse/N18.fasta.aminoglycoside.resfinder.blast.xml
2018-08-10 10:29:03,372 DEBUG BlastResultsParser.parse_results,58: /var/folders/b7/frgczw4n53xd4nlczrjd3jwc0000gq/T/tmpp1hv_jse/N18.fasta.beta-lactam.resfinder.blast.xml
2018-08-10 10:29:03,405 DEBUG ResfinderHitHSP.__init__,25: record=qseqid                                 blaTEM-108_1_AF506748
sseqid                                                     4
pident                                                99.414
length                                                   853
qstart                                                     9
qend                                                     861
sstart                                                 39632
send                                                   40484
slen                                                   83930
qlen                                                     861
sstrand                                                 plus
sseq       TCAACATTTTCGTGTCGCCCTTATTCCCTTTTTTGCGGCATTTTGC...
qseq       TCAACATTTCCGTGTCGCCCTTATTCCCTTTTTTGCGGCATTTTGC...
plength                                              99.0708
Name: 108, dtype: object
2018-08-10 10:29:03,425 ERROR staramr.<module>,75: expected string or bytes-like object
Traceback (most recent call last):
  File "/Users/kirill/miniconda/envs/mob_suite/bin/staramr", line 68, in <module>
    args.run_command(args)
  File "/Users/kirill/miniconda/envs/mob_suite/lib/python3.6/site-packages/staramr/subcommand/Search.py", line 356, in run
    files=args.files)
  File "/Users/kirill/miniconda/envs/mob_suite/lib/python3.6/site-packages/staramr/subcommand/Search.py", line 216, in _generate_results
    plength_threshold_pointfinder, report_all_blast)
  File "/Users/kirill/miniconda/envs/mob_suite/lib/python3.6/site-packages/staramr/detection/AMRDetection.py", line 65, in run_amr_detection
    plength_threshold_resfinder, report_all)
  File "/Users/kirill/miniconda/envs/mob_suite/lib/python3.6/site-packages/staramr/detection/AMRDetectionResistance.py", line 36, in _create_resfinder_dataframe
    return resfinder_parser.parse_results()
  File "/Users/kirill/miniconda/envs/mob_suite/lib/python3.6/site-packages/staramr/blast/results/BlastResultsParser.py", line 61, in parse_results
    self._handle_blast_hit(file, database_name, blast_out, results, hit_seq_records)
  File "/Users/kirill/miniconda/envs/mob_suite/lib/python3.6/site-packages/staramr/blast/results/BlastResultsParser.py", line 93, in _handle_blast_hit
    partitions.append(self._create_hit(in_file, database_name, blast_record))
  File "/Users/kirill/miniconda/envs/mob_suite/lib/python3.6/site-packages/staramr/blast/results/BlastHitPartitions.py", line 38, in append
    partition = self._get_existing_partition(hit)
  File "/Users/kirill/miniconda/envs/mob_suite/lib/python3.6/site-packages/staramr/blast/results/BlastHitPartitions.py", line 56, in _get_existing_partition
    partition_name = hit.get_genome_contig_id()
  File "/Users/kirill/miniconda/envs/mob_suite/lib/python3.6/site-packages/staramr/blast/results/AMRHitHSP.py", line 101, in get_genome_contig_id
    re_search = re.search(r'^(\S+)', self._blast_record['sseqid'])
  File "/Users/kirill/miniconda/envs/mob_suite/lib/python3.6/re.py", line 182, in search
    return _compile(pattern, flags).search(string)
TypeError: expected string or bytes-like object
apetkau commented 5 years ago

Thanks for reporting. It looks like there's a bug in my code where I fail to cast the sequence id to a string, so if it happens to be a number (like >1) it will crash.

I am working on a fix. In the meantime, if you wanted to make this particular file work you could replace all numerical sequence ids with a mixture of numbers/letters. For example, replacing >1 with >a1. An example sed command to do this would be:

sed -i.orig -e 's/>\(\S*\)/>a\1/' N18.fasta
kbessonov1984 commented 5 years ago

Thanks, it works. I will try to have fasta titles starting with the character.