saccharis / SACCHARIS_2

CLI and GUI based bioinformatics pipeline to automate phylogenetic analysis of CAZyme families in FASTA sequences.
GNU General Public License v3.0
4 stars 0 forks source link

No dbCAN HMM database found #5

Open JoelPHoward opened 4 months ago

JoelPHoward commented 4 months ago

Hello,

I installed saccharis_2 on an HPC cluster using conda. The version I am running is 2.0.0.dev22.

I am trying to test saccharis using the following command:

saccharis \
--family GH43 \
--subfamily 1 \
--cazyme_mode all_cazymes \
--domain bacteria \
--seqfile sequences.faa \
--verbose \
--threads 10 \
--tree fasttree

But while dbcan is running, I get the following error:

*********************************************
* Appending user sequences takes:
*        0.1 seconds to run
*********************************************
dbCAN processing of GH43_1_ALL_CAZYMES_cazy_UserFile00000.fasta is underway...
ERROR: No dbCAN HMM database found.         Please make sure that your dbCAN HMM database is named 'dbCAN-HMMdb-V8.txt' or the newest one, has been through hmmpress, and is located in your database directory

I manually downloaded dbCAN-HMMdb-V8.txt from dbcan and tried rerunning before and after passing it through hmmpress, but I still experience the same error.

Any suggestions?

AlexSCFraser commented 4 months ago

Some questions to help me:

What version of dbcan is installed?

Did you install the python packages using in a conda env, in a pyenv, or just bare pip install?

Could you list the contents of the saccharis db directory?

I know another use had an issue on a cluster because their home directory was limited in size and a temporary solution was to use symlinks to respect the limited home drive size but still point to the correct files. Is there any limitations on filesizes in your home directory?

JoelPHoward commented 4 months ago

Hi Alex,

The version of dbcan is 3.0.7. I used conda env to install enverything.

./saccharis/
├── config
│   └── advanced_settings.json
├── db
│   ├── CAZy.dmnd
│   ├── dbCAN-HMMdb-V12.txt
│   ├── dbCAN-HMMdb-V12.txt.h3f
│   ├── dbCAN-HMMdb-V12.txt.h3i
│   ├── dbCAN-HMMdb-V12.txt.h3m
│   ├── dbCAN-HMMdb-V12.txt.h3p
│   ├── dbCAN-HMMdb-V8.txt
│   ├── dbCAN-HMMdb-V8.txt.h3f
│   ├── dbCAN-HMMdb-V8.txt.h3i
│   ├── dbCAN-HMMdb-V8.txt.h3m
│   ├── dbCAN-HMMdb-V8.txt.h3p
│   ├── dbCAN-PUL_07-01-2022.txt
│   ├── dbCAN-PUL_07-01-2022.xlsx
│   ├── dbCAN-PUL.tar.gz
│   ├── dbCAN_sub.hmm.h3f
│   ├── dbCAN_sub.hmm.h3i
│   ├── dbCAN_sub.hmm.h3m
│   ├── dbCAN_sub.hmm.h3p
│   ├── dbCAN.txt.h3f
│   ├── dbCAN.txt.h3i
│   ├── dbCAN.txt.h3m
│   ├── dbCAN.txt.h3p
│   ├── fam-substrate-mapping-08252022.tsv
│   ├── PUL.faa
│   ├── PUL.faa.pdb
│   ├── PUL.faa.phr
│   ├── PUL.faa.pin
│   ├── PUL.faa.pjs
│   ├── PUL.faa.pot
│   ├── PUL.faa.psq
│   ├── PUL.faa.ptf
│   ├── PUL.faa.pto
│   ├── stp.hmm.h3f
│   ├── stp.hmm.h3i
│   ├── stp.hmm.h3m
│   ├── stp.hmm.h3p
│   ├── tcdb.dmnd
│   ├── tf-1.hmm.h3f
│   ├── tf-1.hmm.h3i
│   ├── tf-1.hmm.h3m
│   ├── tf-1.hmm.h3p
│   ├── tf-2.hmm.h3f
│   ├── tf-2.hmm.h3i
│   ├── tf-2.hmm.h3m
│   └── tf-2.hmm.h3p
└── logs
    └── cli_logs.txt

Note the dbCAN-HMMdb-V8 and dbCAN-HMMdb-V12 files were manually downloaded after I experienced the error. Since I was able to download these files manually, I don't think it is a file size issue. I also checked my storage allocation in my home directory and it should be more than enough.

Since posting this issue, I tried to run saccharis on my personal computer (Macbook pro, 13-inch 2020, OSX 12.6.3), but experienced the exact same error. The HPC is linux based.

lowkri commented 1 week ago

Hi Joel. Sorry for the delayed response.

If you download the dbCAN-HMMdb-V12.txt and then rename to dbCAN.txt, it should work after this point.

I'll work on getting a fix pushed out.

jzrapp commented 3 days ago

Hi,

I am getting the same error. dbCAN processing of GH34_ALL_CAZYMES_cazy_UserFile00000.fasta is underway... ERROR: No dbCAN HMM database found. Please make sure that your dbCAN HMM database is named 'dbCAN-HMMdb-V8.txt' or the newest one, has been through hmmpress, and is located in your database directory

My db folder contains CAZy.dmnd dbCAN-PUL_07-01-2022.txt dbCAN-PUL_07-01-2022.xlsx dbCAN-PUL.tar.gz dbCAN_sub.hmm.h3f dbCAN_sub.hmm.h3i dbCAN_sub.hmm.h3m dbCAN_sub.hmm.h3p dbCAN.txt.h3f dbCAN.txt.h3i dbCAN.txt.h3m dbCAN.txt.h3p fam-substrate-mapping-08252022.tsv PUL.faa PUL.faa.pdb PUL.faa.phr PUL.faa.pin PUL.faa.pjs PUL.faa.pot PUL.faa.psq PUL.faa.ptf PUL.faa.pto stp.hmm.h3f stp.hmm.h3i stp.hmm.h3m stp.hmm.h3p tcdb.dmnd tf-1.hmm.h3f tf-1.hmm.h3i tf-1.hmm.h3m tf-1.hmm.h3p tf-2.hmm.h3f tf-2.hmm.h3i tf-2.hmm.h3m tf-2.hmm.h3p

How can I work around it? Thanks!

lowkri commented 3 days ago

Hi. It seems you're missing the dbCAN.txt. Please go to dbCAN's website and download the latest database version: https://bcb.unl.edu/dbCAN2/download/dbCAN-HMMdb-V12.txt

This file should be renamed to dbCAN.txt and put in your ~/saccharis/db folder.

The current version of SACCHARIS on GitHub has this and a few other bugs patched, but we're working on uploading it to bioconda still.