I updated my databases in v0.10.0 with staramr db update --update-default. I noticed there are some new entries in the PointFinder database which have a different naming structure which cause this pipeline to error.
conda/staramr_v0.10.0_updateddb/lib/python3.11/site-packages/Bio/Application/__init__.py:40: BiopythonDeprecationWarning: The Bio.Application modules and modules relying on it have been deprecated.
Due to the on going maintenance burden of keeping command line application
wrappers up to date, we have decided to deprecate and eventually remove these
modules.
We instead now recommend building your command line and invoking it directly
with the subprocess module.
warnings.warn(
2024-03-04 15:10:39 WARNING: Using non-default ResFinder/PointFinder. This may lead to differences in the detected AMR genes depending on how the database files are structured.
2024-03-04 15:10:39 INFO: No --plasmidfinder-database-type specified. Will search the entire PlasmidFinder database
2024-03-04 15:10:39 INFO: --output-dir set. All files will be output to [results_17]
2024-03-04 15:10:39 INFO: Will exclude ResFinder/PointFinder genes listed in [conda/staramr_v0.10.0_updateddb/lib/python3.11/site-packages/staramr/databases/exclude/data/genes_to_exclude.tsv]. Use --no-exclude-genes to disable
2024-03-04 15:10:39 INFO: Will report complex mutations listed in [conda/staramr_v0.10.0_updateddb/lib/python3.11/site-packages/staramr/databases/resistance/pointfinder/complex/data/complex_mutations.tsv]
2024-03-04 15:10:39 INFO: Making BLAST databases for input files
2024-03-04 15:10:39 INFO: Scheduling blasts and MLST for 17A19CPO005.fasta
2024-03-04 15:10:47 WARNING: No drug found for drug_class=all, gene=catB3_2, accession=U13880
2024-03-04 15:10:47 WARNING: No drug found for drug_class=all, gene=aac(6')-Ib-cr_1, accession=DQ303918
2024-03-04 15:10:47 WARNING: Multiple entries found for drug_class=all, gene=aac(6')-Ib-cr_1, accession=DQ303918
2024-03-04 15:10:47 WARNING: No drug found for drug_class=all, gene=blaOXA-1_1, accession=HQ170510
2024-03-04 15:10:47 WARNING: No drug found for drug_class=all, gene=blaCTX-M-15_1, accession=AY044436
2024-03-04 15:10:47 WARNING: No drug found for drug_class=all, gene=blaCMY-42_1, accession=HM146927
2024-03-04 15:10:47 WARNING: No drug found for drug_class=all, gene=qnrS1_1, accession=AB187515
2024-03-04 15:10:47 WARNING: No drug found for drug_class=all, gene=blaOXA-181_1, accession=CM004561
2024-03-04 15:10:47 WARNING: No drug found for drug_class=all, gene=mph(A)_2, accession=U36578
2024-03-04 15:10:47 WARNING: Multiple entries found for drug_class=aminoglycoside, gene=aac(6')-Ib-cr_1, accession=DQ303918
2024-03-04 15:10:47 ERROR: invalid literal for int() with base 10: 'ampC-promoter-size-53'
Traceback (most recent call last):
File "conda/staramr_v0.10.0_updateddb/bin/staramr", line 68, in <module>
args.run_command(args)
File "conda/staramr_v0.10.0_updateddb/lib/python3.11/site-packages/staramr/subcommand/Search.py", line 480, in run
results = self._generate_results(database_repos=database_repos,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "conda/staramr_v0.10.0_updateddb/lib/python3.11/site-packages/staramr/subcommand/Search.py", line 296, in _generate_results
amr_detection.run_amr_detection(files,pid_threshold, plength_threshold_resfinder,
File "conda/staramr_v0.10.0_updateddb/lib/python3.11/site-packages/staramr/detection/AMRDetection.py", line 198, in run_amr_detection
self._pointfinder_dataframe = self._create_pointfinder_dataframe(pointfinder_blast_map, pid_threshold,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "conda/staramr_v0.10.0_updateddb/lib/python3.11/site-packages/staramr/detection/AMRDetectionResistance.py", line 62, in _create_pointfinder_dataframe
return pointfinder_parser.parse_results()
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "conda/staramr_v0.10.0_updateddb/lib/python3.11/site-packages/staramr/blast/results/BlastResultsParser.py", line 67, in parse_results
self._handle_blast_hit(file, database_name, blast_out, results, hit_seq_records)
File "conda/staramr_v0.10.0_updateddb/lib/python3.11/site-packages/staramr/blast/results/BlastResultsParser.py", line 105, in _handle_blast_hit
partitions.append(self._create_hit(in_file, database_name, blast_record))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "conda/staramr_v0.10.0_updateddb/lib/python3.11/site-packages/staramr/blast/results/pointfinder/BlastResultsParserPointfinder.py", line 54, in _create_hit
return PointfinderHitHSPPromoter(file, blast_record, database_name)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "conda/staramr_v0.10.0_updateddb/lib/python3.11/site-packages/staramr/blast/results/pointfinder/nucleotide/PointfinderHitHSPPromoter.py", line 20, in __init__
self._parse_database_name(database_name)
File "conda/staramr_v0.10.0_updateddb/lib/python3.11/site-packages/staramr/blast/results/pointfinder/nucleotide/PointfinderHitHSPPromoter.py", line 118, in _parse_database_name
size = int(size_string.replace('bp', '')) # remove the 'bp' and convert to an int
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ValueError: invalid literal for int() with base 10: 'ampC-promoter-size-53'
From my understanding, it's because this function in PointfinderHitHSPPromoter.py is splitting on underscores instead of dashes:
def _parse_database_name(self, database_name):
"""
Parses the name of the database in order to obtain the promoter offset.
The database name is expected to have the following format:
[GENENAME]_promoter_size_[SIZE]bp
example:
embA_promoter_size_115bp
"""
tokens = database_name.split("_") # split the name into tokens
size_string = tokens[len(tokens) - 1] # get the last token
size = int(size_string.replace('bp', '')) # remove the 'bp' and convert to an int
self.offset = size
I modified my PointFinder database files for a quick workaround (renaming ampC-promoter-size-53 to ampC_promoter_size_53) and it runs fine for me now, but opening this as an FYI as there may be other genes with similar issues.
Hello,
I updated my databases in v0.10.0 with
staramr db update --update-default
. I noticed there are some new entries in the PointFinder database which have a different naming structure which cause this pipeline to error.Command:
staramr search -o e_coli.fasta -o results --pointfinder-organism escherichia_coli
Output:
From my understanding, it's because this function in PointfinderHitHSPPromoter.py is splitting on underscores instead of dashes:
I modified my PointFinder database files for a quick workaround (renaming ampC-promoter-size-53 to ampC_promoter_size_53) and it runs fine for me now, but opening this as an FYI as there may be other genes with similar issues.