Closed ronjasan closed 8 months ago
Hi there,
Thanks for using aviary
, I just need to grab some extra info from you to help.
aviary
are you using?.condarc
so I can verify that conda is using the correct channels.So the easy one, the rosella
shape error has been fixed in more recent versions of aviary
(version >= 0.8.3). This will also update rosella
and make the pipeline a bit faster.
The checkm2
errors I suspect are due to conda pulling down the wrong version of tensorflow
Cheers, Rhys
I am using aviary v0.8.3
, and I see that it has installed rosella v0.5.1
in the rosella
environment. I will try updating rosella
and see if it fixes that problem.
My .condarc
looks like this:
auto_activate_base: false
channels:
- conda-forge
- bioconda
- defaults
channel_priority: strict
The checkm2
environment has these packages installed:
Thanks for the quick reply!
Ah, okay I was mistaken then. It will be completely updated in v0.9.0
but new installs of aviary
should pull the correct version of rosella
.
I think the problem lies in the use of channel_priority: strict
. I know that Snakemake
says in its documentation that channel_priority: strict
should be used but I've found with my recipes that this often breaks the environment. If you remove channel_prirority: strict
from your .condarc
and then delete the checkm2
environment and let aviary
rebuild it I have a feeling that everything should work as expected.
The old documentation for aviary
used to specify that channel_priority: strict
should be set, but we updated it a few months ago to remove mentioning it. Might need to add in some additional comments to the docs if this is indeed the cause of your issue.
I have let aviary
rebuild the checkm2
environment after removing channel_priority: strict
. Now I encounter a different error:
[01/15/2024 04:03:23 AM] INFO: Running CheckM2 version 1.0.2
[01/15/2024 04:03:23 AM] INFO: Running quality prediction workflow with 16 threads.
[01/15/2024 04:03:23 AM] ERROR: Saved models could not be loaded: 'str' object has no attribute 'decode'
Using CheckM2 database /mnt/databases/checkm2_db/CheckM2_database/uniref100.KO.1.dmnd
Looks to be related to this issue with no current resolution: https://github.com/chklovski/CheckM2/issues/65
I'll keep looking into it, but it would seem checkm2
is unable to open it's CNN models for whatever reason.
Would you please post the current list of software installed in the checkm2
conda environment? I might be able to spot something obvious. The main offending package is likely to be scikit-learn
, if you can ensure that its version is scikit-learn==0.23.2
as I think it handles the unpickling of the models within checkm2
Apologies for the inconvenience
Of course, here are all the packages currently installed in the checkm2
environment:
The installed version is scikit-learn==0.23.2
, so that should be okay.
Very weird, I'll need to cross check the packages in a working environment during my working hours. It looks like you have a valid version of h5py
installed as well, but I wonder if a force install might help.
If you active that checkm2
env and run:
which python
and if that points to the correct python path (it should be the one in the checkm2 conda env). If it does, then run this:
pip install 'h5py==2.10.0' --force-reinstall
and see if that fixes anything. Apparently the error is coming from h5py
, but it shouldn't be occurring with the version you have installed so it's a bit odd.
I tried --force-reinstall
for h5py
, but that only resulted in multiple errors.
It also ruined the environment, resulting in AttributeError: module 'numpy' has no attribute 'object'
when I ran checkm2 -h
inside the active environment.
However, I found a fix that seems to be working. I swapped out the checkm2.yml
file from aviary
with the .yml
file from checkm2
. Then I built the new checkm2
environment with aviary --build
, activated the environment and installed checkm2
with pip install CheckM2
. I am running the pipeline now, and it has run both rule checkm_semibin
and rule checkm_metabat2
successfully.
Thanks for the help, and I'm looking forward to continue using aviary
!
Good work! Glad the pipeline is working for you now.
Okay cool, yeah that's what we originally did to get the checkm2
env working but it must have been updated without me realising. Thank you for documenting your fix, I'll see about implementing it in.
Closing this issue for now
Hi,
I have run
aviary complete
with Nanopore long reads, and get an error when executing rulescheckm_metabat2
andcheckm_semibin
. For both rules I get the following error message regardingnumpy
:AttributeError: module 'numpy' has no attribute 'object'
.Snakemake log
``` Building DAG of jobs... Using shell: /usr/bin/bash Provided cores: 24 Rules claiming more threads will be scaled down. Provided resources: mem_mb=256000 Job stats: job count ---------------- ------- checkm2 1 checkm_das_tool 1 checkm_metabat2 1 checkm_semibin 1 das_tool 1 finalise_stats 1 get_abundances 1 gtdbtk 1 recover_mags 1 refine_dastool 1 refine_metabat2 1 refine_semibin 1 singlem_appraise 1 total 13 Select jobs to execute... [Fri Jan 12 15:20:00 2024] rule checkm_metabat2: input: data/metabat_bins_2/done output: data/metabat_bins_2/checkm2_out, data/metabat_bins_2/checkm.out log: logs/checkm_metabat2.log jobid: 6 reason: Missing output files: data/metabat_bins_2/checkm.out threads: 16 resources: tmpdir=/home/work/ronjasan, mem_mb=131072, mem_mib=125000, runtime=480, gpus=0 Activating conda environment: ../../../../../../../../../mnt/users/ronjasan/miniforge3/envs/aviary/c0310444dbde1742ee364906339cb3c7_ Activating conda environment: ../../../../../../../../../mnt/users/ronjasan/miniforge3/envs/aviary/c0310444dbde1742ee364906339cb3c7_ [Fri Jan 12 15:20:10 2024] Error in rule checkm_metabat2: jobid: 6 input: data/metabat_bins_2/done output: data/metabat_bins_2/checkm2_out, data/metabat_bins_2/checkm.out log: logs/checkm_metabat2.log (check log file(s) for error details) conda-env: /mnt/users/ronjasan/miniforge3/envs/aviary/c0310444dbde1742ee364906339cb3c7_ RuleException: CalledProcessError in file /mnt/users/ronjasan/miniforge3/envs/aviary/lib/python3.11/site-packages/aviary/modules/binning/binning.smk, line 444: Command 'source /mnt/users/ronjasan/miniforge3/envs/aviary/bin/activate '/mnt/users/ronjasan/miniforge3/envs/aviary/c0310444dbde1742ee364906339cb3c7_'; set -euo pipefail; python /net/fs-2/scale/OrionStore/Scratch/ronjasan/Flisa/DNAseq/2_aviary_single_S1/.snakemake/scripts/tmpkh9btayl.run_checkm.py' returned non-zero exit status 1. File "/mnt/users/ronjasan/miniforge3/envs/aviary/lib/python3.11/site-packages/aviary/modules/binning/binning.smk", line 444, in __rule_checkm_metabat2 File "/mnt/users/ronjasan/miniforge3/envs/aviary/lib/python3.11/concurrent/futures/thread.py", line 58, in run Select jobs to execute... [Fri Jan 12 15:20:10 2024] rule checkm_semibin: input: data/semibin_bins/done output: data/semibin_bins/checkm2_out, data/semibin_bins/checkm.out log: logs/checkm_semibin.log jobid: 20 reason: Missing output files: data/semibin_bins/checkm.out threads: 16 resources: tmpdir=/home/work/ronjasan, mem_mb=131072, mem_mib=125000, runtime=480, gpus=0 Activating conda environment: ../../../../../../../../../mnt/users/ronjasan/miniforge3/envs/aviary/c0310444dbde1742ee364906339cb3c7_ Activating conda environment: ../../../../../../../../../mnt/users/ronjasan/miniforge3/envs/aviary/c0310444dbde1742ee364906339cb3c7_ [Fri Jan 12 15:20:14 2024] Error in rule checkm_semibin: jobid: 20 input: data/semibin_bins/done output: data/semibin_bins/checkm2_out, data/semibin_bins/checkm.out log: logs/checkm_semibin.log (check log file(s) for error details) conda-env: /mnt/users/ronjasan/miniforge3/envs/aviary/c0310444dbde1742ee364906339cb3c7_ RuleException: CalledProcessError in file /mnt/users/ronjasan/miniforge3/envs/aviary/lib/python3.11/site-packages/aviary/modules/binning/binning.smk, line 469: Command 'source /mnt/users/ronjasan/miniforge3/envs/aviary/bin/activate '/mnt/users/ronjasan/miniforge3/envs/aviary/c0310444dbde1742ee364906339cb3c7_'; set -euo pipefail; python /net/fs-2/scale/OrionStore/Scratch/ronjasan/Flisa/DNAseq/2_aviary_single_S1/.snakemake/scripts/tmpl4fu2eop.run_checkm.py' returned non-zero exit status 1. File "/mnt/users/ronjasan/miniforge3/envs/aviary/lib/python3.11/site-packages/aviary/modules/binning/binning.smk", line 469, in __rule_checkm_semibin File "/mnt/users/ronjasan/miniforge3/envs/aviary/lib/python3.11/concurrent/futures/thread.py", line 58, in run Exiting because a job execution failed. Look above for error message Complete log: .snakemake/log/2024-01-12T151954.328670.snakemake.log ```checkm_metabat2 log
``` Traceback (most recent call last): File "/mnt/users/ronjasan/miniforge3/envs/aviary/c0310444dbde1742ee364906339cb3c7_/bin/checkm2", line 27, incheckm_semibin log
``` Traceback (most recent call last): File "/mnt/users/ronjasan/miniforge3/envs/aviary/c0310444dbde1742ee364906339cb3c7_/bin/checkm2", line 27, inI also get an error with
rosella
, where it does not produce bins due to a shape error.rosella log
``` [2024-01-11T13:53:01Z INFO rosella] rosella version 0.5.1 [2024-01-11T13:53:01Z INFO rosella::recover::recover_engine] Calculating contig coverages. [2024-01-11T13:53:01Z INFO rosella::recover::recover_engine] Calculating TNF table. [2024-01-11T13:53:05Z ERROR rosella] Recover Failed with error: ShapeError/IncompatibleShape: incompatible shapes ```