vinisalazar / metaphor

Metaphor: a general-purpose workflow for assembly and binning of metagenomes
https://metaphor-workflow.readthedocs.io/
Other
37 stars 3 forks source link

metaphor test errors #70

Closed mariafpv closed 5 months ago

mariafpv commented 7 months ago

Hello, I am trying to use metaphor with my data. I installed it in a cluster (NAME="CentOS Linux", VERSION="8", PLATFORM_ID="platform:el8") following the tutorial:

mamba create -n metaphor -c conda-forge -c bioconda metaphor

mamba activate metaphor

Then, I tried to run the test in a job:

#!/bin/bash
#SBATCH -p medium                                   
#SBATCH -N 1                                        
#SBATCH -n 1                                        
#SBATCH --cpus-per-task=8                          
#SBATCH --mem=30G                                  
#SBATCH --time=8:00:00                             

mamba activate metaphor
metaphor test -y

However, is failing when creating the environments: _... Starting Snakemake. This may require the installation of conda environments which should take a while.

Your command is: snakemake \ --snakefile /hpcfs/.../.miniforge/envs/metaphor/lib/python3.11/site-packages/metaphor/workflow/Snakefile \ --configfile /hpcfs/.../.miniforge/envs/metaphor/lib/python3.11/site-packages/metaphor/config/test-config.yaml \ --cores 3 \ --printshellcmds --use-conda \ --wrapper-prefix https://github.com/snakemake/snakemake-wrappers/raw/ \ --conda-prefix /hpcfs/.../.miniforge/envs/metaphor/lib/python3.11/site-packages/metaphor/config/conda \ --config maxmb=8192 Metaphor did not finish with exit code 1. Please see the error message below ...

I understand there is no need of additional packages or software. Could you help me to find a solution?

vinisalazar commented 7 months ago

Hello @mariafpv,

Thank you for your interest in using Metaphor. Are those the full contents of your error message? If not, could you please paste the entire error message, as well as any other logs associated with the cluster job?

If it happens as soon as Metaphor starts, it is likely a problem with the creation of conda environments, which will have the dependencies necessaries to run Metaphor. This can sometimes be due to permission problems in shared systems such as clusters, but hard to know for sure without more complete logs.

Best, Vini

mariafpv commented 7 months ago

Hello @vinisalazar,

Thanks for your response. This is the full error message that I can identify from the job I sent:

Starting Snakemake.
This may require the installation of conda environments which should take a while.

Your command is:
snakemake   \
    --snakefile /hpcfs/home/ciencias_biologicas/mf.penav1/.miniforge/envs/metaphor/lib/python3.11/site-packages/metaphor/workflow/Snakefile     \
    --configfile /hpcfs/home/ciencias_biologicas/mf.penav1/.miniforge/envs/metaphor/lib/python3.11/site-packages/metaphor/config/test-config.yaml       \
    --cores 3       \
    --printshellcmds --use-conda        \
    --wrapper-prefix https://github.com/snakemake/snakemake-wrappers/raw/       \
    --conda-prefix /hpcfs/home/ciencias_biologicas/mf.penav1/.miniforge/envs/metaphor/lib/python3.11/site-packages/metaphor/config/conda        \
    --config max_mb=8192
Metaphor did not finish with exit code 1. Please see the error message below.
An error occurred while running Metaphor. Please check the traceback below.
Traceback (most recent call last):
  File "/hpcfs/home/ciencias_biologicas/mf.penav1/.miniforge/envs/metaphor/bin/metaphor", line 10, in <module>
    sys.exit(main())
             ^^^^^^
  File "/hpcfs/home/ciencias_biologicas/mf.penav1/.miniforge/envs/metaphor/lib/python3.11/site-packages/metaphor/cli/cli.py", line 278, in main
    args.func(args)
  File "/hpcfs/home/ciencias_biologicas/mf.penav1/.miniforge/envs/metaphor/lib/python3.11/site-packages/metaphor/cli/test.py", line 182, in main
    retcode = run_cmd(cmd)
              ^^^^^^^^^^^^
  File "/hpcfs/home/ciencias_biologicas/mf.penav1/.miniforge/envs/metaphor/lib/python3.11/site-packages/metaphor/utils.py", line 92, in run_cmd
    retcode = check_call(cmd.split())
              ^^^^^^^^^^^^^^^^^^^^^^^
  File "/hpcfs/home/ciencias_biologicas/mf.penav1/.miniforge/envs/metaphor/lib/python3.11/subprocess.py", line 413, in check_call
    raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['snakemake', '--snakefile', '/hpcfs/home/ciencias_biologicas/mf.penav1/.miniforge/envs/metaphor/lib/python3.11/site-packages/metaphor/workflow/Snakefile', '--configfile', '/hpcfs/home/ciencias_biologicas/mf.penav1/.miniforge/envs/metaphor/lib/python3.11/site-packages/metaphor/config/test-config.yaml', '--cores', '3', '--printshellcmds', '--use-conda', '--wrapper-prefix', 'https://github.com/snakemake/snakemake-wrappers/raw/', '--conda-prefix', '/hpcfs/home/ciencias_biologicas/mf.penav1/.miniforge/envs/metaphor/lib/python3.11/site-packages/metaphor/config/conda', '--config', 'max_mb=8192']' returned non-zero exit status 1.

Additionally, I have attached the log files I found.

Thanks again! 2024-03-01T151845.210901.snakemake.log download_taxonomy_database.log download_COG_database.log

vinisalazar commented 7 months ago

Hello @mariafpv,

Your download_taxonomy_database.log and download_COG_database.log both show the same message:

Resolving ftp.ncbi.nih.gov (ftp.ncbi.nih.gov)... failed: Name or service not known.
wget: unable to resolve host address ‘ftp.ncbi.nih.gov’

However, if you take the address that it's trying to access: https://ftp.ncbi.nih.gov/pub/COG/COG2020/data/cog-20.fa.gz and put it in a browser, it shows that the address is indeed valid. So, I suspect these errors are likely due to a network problem, which could be either from the NCBI side or from your server. Could you please try one more time, and if the problem persists, maybe consider contacting your sys admin to see if there is a problem with the network?

So you get a better idea, this is the command ran which is executed by Metaphor:

wget -v https://ftp.ncbi.nih.gov/pub/taxonomy/new_taxdump/new_taxdump.tar.gz -O new_taxdump.tar.gz

If you try running this on your Terminal, it should start the download of the taxonomy database. If it doesn't, that indicates some sort of network problem.

If the problem persists, you can download the databases manually and configure Metaphor to use them, but that takes adjusting some settings.

Let me know how you go.

Best, Vini

mariafpv commented 7 months ago

Hello @vinisalazar

I still have the same error when running the test, however, I ran the download command (wget -v https://ftp.ncbi.nih.gov/pub/taxonomy/new_taxdump/new_taxdump.tar.gz -O new_taxdump.tar.gz) and I already have that db on my system.

I think I could try downloading the databases... What would be the settings I should make to run Metaphor again downloading the databases?

vinisalazar commented 7 months ago

When you run Metaphor, you'll need a configuration file, usually named metaphor_settings.yaml. That file will have a parameter called data_dir, set to DEFAULT. Replace DEFAULT with the directory where your database is located.

vinisalazar commented 7 months ago

For the test command, you'll need to modify the test configuration file. You can find it's path with the following command: metaphor config show --test-config. Do the same, set data_dir to the directory where you have the database.

mariafpv commented 7 months ago

It worked in the test and on my data. However, now I have a new error. It is in the assembly part, could you help me identify what it could also be about?

coassembly.log

vinisalazar commented 7 months ago

Hello @mariafpv,

Glad to hear the database download worked.

That log file indicates what is likely a failure due to lack of memory. Coassemblies can be very computationally intensive. That log shows nearly 1TB of RAM used by MegaHIT (line 11). You can try increasing your memory requirements, or consider assembling your samples individually (or making smaller coassemblies, also supported by Metaphor).

Also, not sure if you are already doing this, but if you are running Metaphor on an HPC system, you may consider using a Snakemake execution profile.

Best, Vini

vinisalazar commented 5 months ago

I've closed this issue for now, but please feel free to reopen it if you continue to have problems.