New dbCAN v9 - Githubissues

nextgenusfs / funannotate

Eukaryotic Genome Annotation Pipeline

http://funannotate.readthedocs.io

BSD 2-Clause "Simplified" License

314 stars 83 forks source link

New dbCAN v9 #657

Open hyphaltip opened 2 years ago

hyphaltip commented 2 years ago

https://github.com/nextgenusfs/funannotate/blob/00c207ad4041270a5e0e1f6cff711f697c5d3abc/funannotate/downloads.json#L6

Do we want to try to point to v9? I can update but not sure what else would be broken or if you want to make sure this change goes on a new point release/branch @nextgenusfs ? https://bcb.unl.edu/dbCAN2/download/dbCAN-HMMdb-V9.txt

I also have been running hotpep and run_dbcan full runs get more specificity in predictions - not sure if you want to consider it.

nextgenusfs commented 2 years ago

We can simply update the urls -- this actually gets fetched at runtime if there is internet connection so doesn't require users to update the codebase to update the urls. But might be good to create a new branch to do this on in case there are any differences in file format, parsing, etc needed. Also feel free to add any extra processing if improves results -- would this require more dependencies?

hyphaltip commented 2 years ago

doing the full run_dbscan would not be installing their software (a python script) which would be a dependency. But it is run like this using install of @linnabrown https://github.com/linnabrown/run_dbcan

    run_dbcan.py --db_dir $CAZY_FOLDER --out_dir $OUT.run_dbcan --tools all \
    --stp_cpu $CPUS --hotpep_cpu $CPUS --hmm_cpu $CPUS --dia_cpu $CPUS --tf_cpu $CPUS \
    $INFILE protein