metagenome-atlas / atlas

ATLAS - Three commands to start analyzing your metagenome data
https://metagenome-atlas.github.io/
BSD 3-Clause "New" or "Revised" License
370 stars 97 forks source link

Error in rule rename_genomes and dram_download #547

Closed bnm1317 closed 2 years ago

bnm1317 commented 2 years ago

Hello, I was running atlas run genomes and ran into these error messages. Thank you for your help in advance!!

Activating conda environment: databases/conda_envs/3930879f549f543479b6c612396f4e51
[Wed Aug 24 09:27:30 2022]
Error in rule dram_download:
    jobid: 1676
    output: /mnt/home/dao/bnm1023/databases/Dram, /mnt/home/dao/bnm1023/databases/DRAM.config
    log: logs/dram/download_dram.log (check log file(s) for error message)
    conda-env: /mnt/home/dao/bnm1023/databases/conda_envs/3930879f549f543479b6c612396f4e51
    shell:
         DRAM-setup.py prepare_databases  --output_dir /mnt/home/dao/bnm1023/databases/Dram  --threads 8  --verbose  --skip_uniref  &> logs/dram/download_dram.log  ;  DRAM-setup.py export_config --output_file /mnt/home/dao/bnm1023/databases/DRAM.config
        (one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)

Not cleaning up /mnt/home/dao/bnm1023/.snakemake/shadow/tmpm2f7uwyh/.snakemake/scripts/tmprd7l5ig0.rename_genomes.py
[Wed Aug 24 09:27:33 2022]
Error in rule rename_genomes:
    jobid: 0
    output: genomes/genomes, genomes/clustering/contig2genome.tsv, genomes/clustering/old2newID.tsv, genomes/clustering/allbins2genome.tsv
    log: logs/genomes/rename_genomes.log (check log file(s) for error message)

Here is the relevant log output: Log: download_dram.log

  File "/mnt/home/dao/bnm1023/databases/conda_envs/3930879f549f543479b6c612396f4e51/lib/python3.10/site-packages/skbio/stats/distance/_mantel.py", line 16, in <module>
    from scipy.stats import PearsonRNearConstantInputWarning
ImportError: cannot import name 'PearsonRNearConstantInputWarning' from 'scipy.stats' (/mnt/home/dao/bnm1023/databases/conda_envs/3930879f549f543479b6c612396f4e51/lib/python3.10/site-packages/scipy/stats/__init__.py)

Log: rename_genomes.log

FileNotFoundError: [Errno 2] No such file or directory: 'genomes/Dereplication/dereplicated_genomes/../data_tables/Cdb.csv'
SilasK commented 2 years ago

I fear that you don't have many genomes maybe only one. Could you check the output of the dereplication step. genomes/Dereplication/dereplicated_genomes/../data_tables

and maybe the corresponding log.

bnm1317 commented 2 years ago

Hi, thank you! I think I have more than one but I am not sure.

Here is the log...


***************************************************
    ..:: dRep dereplicate Step 1. Filter ::..
***************************************************

Will filter the genome list
185 genomes were input to dRep
Calculating genome info of genomes
100.00% of genomes passed length filtering
75.68% of genomes passed checkM filtering
***************************************************
    ..:: dRep dereplicate Step 2. Cluster ::..
***************************************************

Running primary clustering
Running pair-wise MASH clustering
Traceback (most recent call last):
  File "/mnt/home/dao/bnm1023/databases/conda_envs/a5b9099eadf166923308864d6be332a9/bin/dRep", line 32, in <module>
    Controller().parseArguments(args)
  File "/mnt/home/dao/bnm1023/databases/conda_envs/a5b9099eadf166923308864d6be332a9/lib/python3.6/site-packages/drep/controller.py", line 100, in parseArguments
    self.dereplicate_operation(**vars(args))
  File "/mnt/home/dao/bnm1023/databases/conda_envs/a5b9099eadf166923308864d6be332a9/lib/python3.6/site-packages/drep/controller.py", line 48, in dereplicate_operation
    drep.d_workflows.dereplicate_wrapper(kwargs['work_directory'],**kwargs)
  File "/mnt/home/dao/bnm1023/databases/conda_envs/a5b9099eadf166923308864d6be332a9/lib/python3.6/site-packages/drep/d_workflows.py", line 37, in dereplicate_wrapper
    drep.d_cluster.controller.d_cluster_wrapper(wd, **kwargs)
  File "/mnt/home/dao/bnm1023/databases/conda_envs/a5b9099eadf166923308864d6be332a9/lib/python3.6/site-packages/drep/d_cluster/controller.py", line 179, in d_cluster_wrap$
    GenomeClusterController(workDirectory, **kwargs).main()
  File "/mnt/home/dao/bnm1023/databases/conda_envs/a5b9099eadf166923308864d6be332a9/lib/python3.6/site-packages/drep/d_cluster/controller.py", line 32, in main
    self.run_primary_clustering()
  File "/mnt/home/dao/bnm1023/databases/conda_envs/a5b9099eadf166923308864d6be332a9/lib/python3.6/site-packages/drep/d_cluster/controller.py", line 100, in run_primary_cl$
    Mdb, Cdb, cluster_ret = drep.d_cluster.compare_utils.all_vs_all_MASH(self.Bdb, self.wd.get_dir('MASH'), **self.kwargs)
  File "/mnt/home/dao/bnm1023/databases/conda_envs/a5b9099eadf166923308864d6be332a9/lib/python3.6/site-packages/drep/d_cluster/compare_utils.py", line 102, in all_vs_all_$
    logdir, MASH_folder, sketch_folder, mash_exe = prepare_mash(data_folder, **kwargs)
  File "/mnt/home/dao/bnm1023/databases/conda_envs/a5b9099eadf166923308864d6be332a9/lib/python3.6/site-packages/drep/d_cluster/compare_utils.py", line 137, in prepare_mash
    mash_exe = drep.get_exe('mash')
  File "/mnt/home/dao/bnm1023/databases/conda_envs/a5b9099eadf166923308864d6be332a9/lib/python3.6/site-packages/drep/__init__.py", line 100, in get_exe
    raise ValueError("{0} isn't working- make sure its installed".format(name))
ValueError: mash isn't working- make sure its installed
SilasK commented 2 years ago

Ok, something didn't worked with Drep.

Could you delete the Drep output and conda env:

rm -rf /mnt/home/dao/bnm1023/databases/conda_envs/a5b9099eadf166923308864d6be332a9
rm -r genomes/Dereplication

And then rerun altlas. It should re-install drep (hopefully correctly) and run it.

bnm1317 commented 2 years ago

Thanks for the advice! It is still saying mash isn't working or not installed, but conda list shows it as... mash 2.3 ha9a2dd8_3 bioconda

But when I check dRep dependencies

mash.................................... !!! ERROR !!!   (location = /mnt/home/dao/bnm1023/databases/conda_envs/a5b9099eadf166923308864d6be332a9/bin/mash)
nucmer.................................. all good        (location = /mnt/home/dao/bnm1023/databases/conda_envs/a5b9099eadf166923308864d6be332a9/bin/nucmer)
checkm.................................. !!! ERROR !!!   (location = None)
ANIcalculator........................... !!! ERROR !!!   (location = None)
prodigal................................ all good        (location = /mnt/home/dao/bnm1023/databases/conda_envs/a5b9099eadf166923308864d6be332a9/bin/prodigal)
centrifuge.............................. !!! ERROR !!!   (location = None)
nsimscan................................ !!! ERROR !!!   (location = None)
fastANI................................. all good        (location = /mnt/home/dao/bnm1023/databases/conda_envs/a5b9099eadf166923308864d6be332a9/bin/fastANI)
SilasK commented 2 years ago

Sorry, I raised the problem at the dREP repo. I imagine there is a conflict of version.

You could try to downgrade the mash.

conda activate /mnt/home/dao/bnm1023/databases/conda_envs/a5b9099eadf166923308864d6be332a9
conda install -y mash=2.1
python --version
conda deactivate

Maybe you can also use mamba instead of conda

which python version do you have installed in the environment?

bnm1317 commented 2 years ago

Hi Silas,

I have python 3.6.8 and was able individually install most of the dependencies. However it still seems to be getting stuck with this step of dram

"ImportError: cannot import name 'PearsonRConstantInputWarning' from 'scipy.stats' (/mnt/home/dao/bnm1023/databases/conda_envs/3930879f549f543479b6c612396f4e51/lib/python3.10/si$"

Thanks, Brandy

Brandy Moser

(She/Her/Hers) Graduate Student of Nutritional Sciences Department of Agriculture, Nutrition, and Food Systems | Dao Research Labhttps://mypages.unh.edu/dao-lab/home


From: Silas Kieser @.> Sent: Tuesday, August 30, 2022 8:17 AM To: metagenome-atlas/atlas @.> Cc: Brandy Moser @.>; Author @.> Subject: Re: [metagenome-atlas/atlas] Error in rule rename_genomes and dram_download (Issue #547)

CAUTION: This email originated from outside of the University System. Do not click links or open attachments unless you recognize the sender and know the content is safe.

Sorry, I raised the problem at the dREP repo. I imagine there is a conflict of version.

You could try to downgrade the mash.

conda activate /mnt/home/dao/bnm1023/databases/conda_envs/a5b9099eadf166923308864d6be332a9

conda install -y mash=2.1

python --version

conda deactivate

Maybe you can also use mamba instead of conda

which python version do you have installed in the environment?

— Reply to this email directly, view it on GitHubhttps://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fmetagenome-atlas%2Fatlas%2Fissues%2F547%23issuecomment-1231586900&data=05%7C01%7Cbrandy.moser%40unh.edu%7Cc019be749ea547faec8e08da8a81a674%7Cd6241893512d46dc8d2bbe47e25f5666%7C0%7C0%7C637974586690147142%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=jMmXlLUPzB0xeQfBxf22QIE1vKloOS3Sh9mY%2BNxLwlU%3D&reserved=0, or unsubscribehttps://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FAZUXHRQ3WCRB2BS56ICXWPTV3X3WVANCNFSM57PLR3NA&data=05%7C01%7Cbrandy.moser%40unh.edu%7Cc019be749ea547faec8e08da8a81a674%7Cd6241893512d46dc8d2bbe47e25f5666%7C0%7C0%7C637974586690303387%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=c38z%2FTn7eKkZMqmipJfxqEXGAWgowb1UF9jUjnwmFBs%3D&reserved=0. You are receiving this because you authored the thread.Message ID: @.***>

SilasK commented 2 years ago

So there are two separate issues. Do you know what solved the drep (not dram) error. Rerunning or did you need to install mash 2.1 manualy?

SilasK commented 2 years ago

To make shure do you have a bunch of genomes in 'genomes/genomes'

SilasK commented 2 years ago

For the dram error. Maybe you want to deactivate dram for the beginning. Go to the config.yaml and comment out all annotations of kegg modules and dram.

So you should be able to lwt altar run until the end.

Could you also try to get the scipy version of the dram env.

conda activate /mnt/home/dao/bnm1023/databases/conda_envs/3930879f549f543479b6c612396f4e51
conda list scipy
conda deactivate 

As above you might want to try to reinstall the dram by simply deleting: rm -r /mnt/home/dao/bnm1023/databases/conda_envs/3930879f549f543479b6c612396f4e51

bnm1317 commented 2 years ago

I was able to resolve drep by downloading manually. Checkm still shows up as an error in the check dependencies but everything was installed. I was also unsuccessful at getting nsimscan in the environment as well. In the end, I had 96 MAGs in the genomes/genomes/.

For dram, I deleted the environment and manually reinstalled dram to the new environment. When I went to prepare the databases it still gave the following error.

ImportError: cannot import name 'PearsonRConstantInputWarning' from 'scipy.stats' (/mnt/home/dao/bnm1023/databases/conda_envs/3930879f549f543479b6c612396f4e51/lib/python3.10/site-packages/scipy/stats/init.py)

The scipy version is 1.9.0. Should I continue by commenting out dram in the config file?

Thank you!!

SilasK commented 2 years ago

Your dram erro is the same as here: https://github.com/biocore/scikit-bio/issues/1818

A fix would be:

conda activate /mnt/home/dao/bnm1023/databases/conda_envs/3930879f549f543479b6c612396f4e51
mamba install -y scipy=1.8.1
conda deactivate