phac-nml / staramr

Scans genome contigs against the ResFinder, PlasmidFinder, and PointFinder databases.
Apache License 2.0
113 stars 26 forks source link

Database restore and invert BLAST #15

Closed apetkau closed 6 years ago

apetkau commented 6 years ago

This makes a number of changes, and is meant to replace the previous merge request.

  1. Implements staramr db restore-default which can be used to restore the default database (or build a database with the proper revisions initially). This involves making a class AMRDatabasesManager.py to handle restoring the default database.
  2. Inverts the direction of BLAST, so I'm BLASTing the AMR database files (amr genes) against the input genomes. To do this, I must first run makeblastdb on all input files (I make symlinks to them first to not clutter up your input files directory). This is done in BlastHandler.py. Most of the rest of the changes involves inverting the BLAST fields I access in AMRHitHSP.py.
    1. Inverting BLAST means that running staramr against the latest ResFinder/PointFinder databases will work. But, I found I get weird results since some genes aren't labelled properly in the fasta files. So, I print out a WARNING if you are using the latest database. Done in Search.py.
  3. Since I no longer need to makeblastdb on the AMR gene files, I've removed this code when building the database.
  4. I also no longer need to check for the strand when searching for point mutations because of the BLAST direction switch (there is a field for sstrand or subject strand, but nothing for qstrand or query strand available in the results, and the AMR gene string/genome contig string always appear in the same strand). Removing strand checking is mostly done in MutationPosition.py and subclasses.
    1. I do still need to check sstrand when it comes to making BLAST hit partitions though. This is where the changes in BlastHitPartitions.py come from.
  5. I changed to using an OrderedDict when getting database info since it makes it easier to check specific keys. Done in AMRDatabaseHandler.py.