maxplanck-ie / snakepipes

Customizable workflows based on snakemake and python for the analysis of NGS data
http://snakepipes.readthedocs.io
379 stars 85 forks source link

default conda env in 3.0.0 #1061

Open sunta3iouxos opened 3 days ago

sunta3iouxos commented 3 days ago

Hi all, Even in I set the snakePipes config --condaEnvDir /scratch/tgeorgom/mamba/snakePipes/env --tempDir /scratch/tgeorgom/temp while riunning the snakePipes createEnvs``, snakePipes creates a folder into /tmp, that I have no access to.

there was this info:

profile used: /scratch/tgeorgom/mamba/snakePipes/lib/python3.12/site-packages/snakePipes/shared/profiles/local
CondaEnvDir detected as: /tmp, from Snakemakeprofile: local

and then a message that saying that the creation of the conda environments failed due to space issue. I had to dig into the vi /scratch/tgeorgom/mamba/snakePipes/lib/python3.12/site-packages/snakePipes/shared/profiles/local/config.yaml to get it right. Is it possible when we are setting the snakePipes config to also change the entries in the shared/profiles/local/config.yam? Thank you.

WardDeb commented 3 days ago

Hi,

Have a look at: https://snakepipes.readthedocs.io/en/stable/content/setting_up.html#configuring-snakepipes

TLDR: the conda directory needs to be set on the profile level, which is not (yet ? cfr. #1049 ) changeable through snakePipes config. Since 3.0.0, snakePipes uses a predefined local profile (which would run all jobs locally), and this is probably something you don't want.

snakePipes info

Will give you the paths that are utilized for a specific installation.

Hope this helps !

sunta3iouxos commented 2 days ago

Hi,

Have a look at: https://snakepipes.readthedocs.io/en/stable/content/setting_up.html#configuring-snakepipes

TLDR: the conda directory needs to be set on the profile level, which is not (yet ? cfr. #1049 ) changeable through snakePipes config. Since 3.0.0, snakePipes uses a predefined local profile (which would run all jobs locally), and this is probably something you don't want.

snakePipes info

Will give you the paths that are utilized for a specific installation.

Hope this helps !

To be honest I am not a vivid snakemake user and some of the configuration files do not make sense to me. There is also another issue related here. The cluster.yaml file is nowhere to be found. Ans also the instructions in the #configuring-snakepipes in my opinion are not clear. They do not direct you to the files. The snakePipes infoonly provides the following information

The global configuration file is:
    /scratch/tgeorgom/mamba/snakePipes/lib/python3.12/site-packages/snakePipes/shared/defaults.yaml
    --> tempDir in the global configuration = /scratch/tgeorgom/temp
    --> The snakemake profile used =  /scratch/tgeorgom/mamba/snakePipes/lib/python3.12/site-packages/snakePipes/shared/profiles/local

in the shared/profiles/local there is no cluster configuration file

WardDeb commented 2 days ago

Cluster configurations (cluster.yaml) are deprecated by snakemake since quite some time, hence why this is dropped from snakePipes now as well. There is some overlap with profiles, though.

If your cluster is using slurm, you can try to adapt the pre-shipped 'snakepipes_genericprofile' to your needs. you can change the snakemakeProfile entry in the global configuration file (/scratch/tgeorgom/mamba/snakePipes/lib/python3.12/site-packages/snakePipes/shared/defaults.yaml) into: 'shared/profiles/snakepipes_genericprofile'

after doing this, snakePipes info will print the full path to the profile directory. Inside there are two files of interest: config.yaml -> submit / logging / conda settings ccancel.sh -> would probably need 'module load slurm' removed from it.

Alternatively, you can set up a profile that works for your computation infrastructure yourself. Some links to get you started: https://snakemake.readthedocs.io/en/stable/executing/cli.html#profiles https://github.com/Snakemake-Profiles

It could also be worth double checking with your HPC sysadmins, most likely profiles specific to your infrastructure are already set up / being used.

Hope this helps ! If you have specific questions or suggestions to make the docs more clear I'd be happy to help and/or implement them.

sunta3iouxos commented 2 days ago

Hi Ward, Yes this is helpfull. It is a huge change from the previous versio. Probably you will need to add that information to the read me files. How you explained things here makes more sense, or better it is a needed information that needs to be in the readme page, before the explaination of each variables.

If your cluster is using slurm, you can try to adapt the pre-shipped 'snakepipes_genericprofile' to your needs. you can change the snakemakeProfile entry in the global configuration file (/scratch/tgeorgom/mamba/snakePipes/lib/python3.12/site-packages/snakePipes/shared/defaults.yaml) into: 'shared/profiles/snakepipes_genericprofile' after doing this, snakePipes info will print the full path to the profile directory. Inside there are two files of interest: config.yaml -> submit / logging / conda settings ccancel.sh -> would probably need 'module load slurm' removed from it.

Thank you for this. I found this files but how to direct snakePipes to use this was not intuitive. Also, this needs to be added in the readme.

Alternatively, you can set up a profile that works for your computation infrastructure yourself. Some links to get you started: https://snakemake.readthedocs.io/en/stable/executing/cli.html#profiles https://github.com/Snakemake-Profiles

It could also be worth double checking with your HPC sysadmins, most likely profiles specific to your infrastructure are already set up / being used. I have already done that previously But, I have a question here, about the partition. I see that there is an entry in the :

default-resources:
mem: 10G
time: 1440
partition: 

is this the same as the one that we set in the sbatch command using the "-p, --partition"? so is it set twice?

Hope this helps ! If you have specific questions or suggestions to make the docs more clear I'd be happy to help and/or implement them.

I will come back when I start testing, probably with another issue, but I am stuck with #1062