metagenlab / zDB

zDB: comparative bacterial genomics made easy
MIT License
29 stars 4 forks source link

Run Error - swissprot database #119

Closed Naturalist1986 closed 2 months ago

Naturalist1986 commented 3 months ago

Hi,

I'm trying to run zdb, but I get this error:

(zdb) moshea@shannon:/mnt/LargeStorageNoBackup/Datasets/Moshea/Databases/zdb$ zdb run --input=../zdb_table.csv Starting the analysis pipeline

N E X T F L O W ~ version 24.04.3

Launching /home/moshea/miniconda/envs/zdb/share/zdb-1.3.1/annotation_pipeline.nf [clever_lavoisier] DSL2 - revision: 067a6b5acf

wrong type for swissprot_db zdb_ref/uniprot/swissprot/ ERROR ~ assert(false)

-- Check script '/home/moshea/miniconda/envs/zdb/share/zdb-1.3.1/annotation_pipeline.nf' at line: 48 or see '.nextflow.log' file for more details Could not finish the analysis Try to rerun the same command with the --resume flag set to rerun the analysis

I haven't found any instruction as to where to run the tool or how to tell it where the databases are, is that the problem?

njohner commented 3 months ago

Did you run the setup (bin/zdb setup)? If so where are your reference databases? What is in the folder zdb_ref/uniprot/swissprot/?

If you read the documentation on how to run the analysis (https://github.com/metagenlab/zDB?tab=readme-ov-file#running-the-analysis or https://zdb.readthedocs.io/en/latest/include_readme_technical_doc.html#running-the-analysis) you will see how to specify the where the reference databases are located. Of course bin/zdb run --help would also give you the list of available command line arguments.

Let me know if you need more help.

Naturalist1986 commented 3 months ago

Yes, I ran zdb setup:

(zdb) moshea@shannon:/mnt/LargeStorageNoBackup/Datasets/Moshea/Databases/zdb/uniprot/swissprot$ ls -lht total 580M -rw-rw-r-- 1 moshea moshea 106M Jul 24 14:31 swissprot.fasta.phr -rw-rw-r-- 1 moshea moshea 4.4M Jul 24 14:31 swissprot.fasta.pin -rw-rw-r-- 1 moshea moshea 198M Jul 24 14:31 swissprot.fasta.psq -rw-rw-r-- 1 moshea moshea 272M Jul 24 14:30 swissprot.fasta -rw-rw-r-- 1 moshea moshea 1.1K Jul 24 14:30 relnotes.txt

This is what I get when trying to run: (zdb) moshea@shannon:/mnt/LargeStorageNoBackup/Datasets/Moshea/Databases$ zdb run --input=zdb_table.csv --ko --cog --pfam --ref_dir=/mnt/LargeStorageNoBackup/Datasets/Moshea/Databases/zdb/ --conda Starting the analysis pipeline

N E X T F L O W ~ version 24.04.3

Launching /home/moshea/miniconda/envs/zdb/share/zdb-1.3.1/annotation_pipeline.nf [voluminous_dalembert] DSL2 - revision: 067a6b5acf

wrong type for swissprot_db /mnt/LargeStorageNoBackup/Datasets/Moshea/Databases/zdb/uniprot/swissprot/ ERROR ~ assert(false)

-- Check script '/home/moshea/miniconda/envs/zdb/share/zdb-1.3.1/annotation_pipeline.nf' at line: 48 or see '.nextflow.log' file for more details Could not finish the analysis Try to rerun the same command with the --resume flag set to rerun the analysis

njohner commented 3 months ago

Unfortunately I could reproduce the issue, seems the conda release is broken somehow. Will take a look tomorrow and keep you posted. If you're in a hurry you can install from source, which will work.

njohner commented 3 months ago

It was an issue with newer releases of nextflow and will be fixed with https://github.com/metagenlab/zDB/pull/120. I'll launch the release process to bioconda this afternoon.

Naturalist1986 commented 3 months ago

Ok, thanks!

njohner commented 3 months ago

Ok release is out, so you can update your conda environment and things should work. I noted that one of the conda environments does not seem to build anymore, so run zdb without the --conda option and everything should be fine. If you absolutely have to use --conda, you will have to apply this change to your zdb (somwhere like ~/bin/miniconda3/envs/zdb/share/zdb-1.3.2/conda/checkm.yaml).

Naturalist1986 commented 3 months ago

Hi,

I've managed to get close to the end of the pipeline using singularity, but it crashed on this error:

INFO: Converting SIF file to temporary sandbox... Loading gbks Loading groups Loading seq hashes Loading orthofinder results Loading alignments Loading checkm results Traceback (most recent call last): File "/mnt/scratch/work/e7/99c142f9724da8c63d6f99f3cc40f3/.command.sh", line 26, in setup_chlamdb.load_genomes_info(kwargs, gbk_list, "checkm_results.tab", "marvelous_elion") File "/mnt/scratch/zDB/bin/setup_chlamdb.py", line 466, in load_genomes_info taxon_id = hsh_filename_to_taxid[row["Bin Id"]] KeyError: nan INFO: Cleaning up image...

njohner commented 3 months ago

At first glance that seems like an issue in your input files, with checkm having nan as "Bin ID" for some genomes. This is supposed to be the unique identifier of the genome, so that won't work. Maybe try to have a look at the checkm outputs?

njohner commented 2 months ago

I'll close this issue, as the issue initially reported is fixed. If you still have an issue with your input, you can open another issue for that and I can try to help you figure out the issue with your input files.