linnabrown / run_dbcan

Run_dbcan V4, using genomes/metagenomes/proteomes of any assembled organisms (prokaryotes, fungi, plants, animals, viruses) to search for CAZymes.
http://bcb.unl.edu/dbCAN2
GNU General Public License v3.0
138 stars 40 forks source link

Fixes and conda packaging for run_dbcan #101

Closed AlexSCFraser closed 2 years ago

AlexSCFraser commented 2 years ago

Hi,

We had a meeting about some bugfixes and conda packaging for run_dbcan. I have updated my changes to the latest version in git, 3.0.5, so they are ready to merge in. To summarize:

  1. The meta.yaml file can be used to build conda packages successfully on my machine. We have these packages installed on several machines successfully(both a linux box and some windows machines using WSL) for our SACCHARIS dev work and it works without errors.
  2. There was a bug with absolute pathnames not being recognized properly and causing empty folder trees to be created, which has been fixed.
  3. The package is fully importable as a python module, without the need to run subprocess(). I also changed the hmmscan parsing code to be imported as a python module by dbcan_cli. This fixes a bug that prevented run_dbcan from executing, which I am fairly sure was caused by the way WSL handles the python interpreter.

As part of making run_dbcan importable as a python module I also split out the argparse functionality into a separate function. This is for clarity, and to simplify calling run_dbcan from other python scripts. It is a large number of line changes but it's mostly just reorganizing the code.

None of the changes remove functionality, however I did have to rename hmmscan-parser.py to hmmscan_parser.py, because you cannot use dashes in a python module name when importing. This is a python naming convention that has been baked into the module import. This wouldn't have any effect on users that call run_dbcan through the normal CLI, but could be considered a breaking change for anyone that calls hmmscan-parser.py directly.

For versioning, the meta.yaml imports the version number from setup.py, keeping a single source of truth for version info.

To get run_dbcan on the bioconda channel, you need to fork the bioconda repo and add only the meta.yaml and merge it back in, as per https://bioconda.github.io/contributor/index.html

Let me know if there are any changes you want explained further, I can go over it on a video call. You can contact me here or email me at alexander.fraser@alumni.ubc.ca

Alex

linnabrown commented 2 years ago

Thank you so much for the great work Alex. I will merge your changes ASAP. Hi, @QiweiGe @HaidYi Could you also take it into a look? Thanks a lot!