jolespin / veba

A modular end-to-end suite for in silico recovery, clustering, and analysis of prokaryotic, microeukaryotic, and viral genomes from metagenomes
GNU Affero General Public License v3.0
77 stars 9 forks source link

[Feature Request] bioconda recipe #95

Open cmkobel opened 6 months ago

cmkobel commented 6 months ago

Installing VEBA is quite comprehensive and not easy for non-bioinformaticians. Is there a bioconda recipe/package or plans hereof for Veba?

jolespin commented 6 months ago

I would really like to do this except I've never made a bioconda package. Do you have any experience with this and able to help out by any chance?

In the meantime, I made this YouTube channel to help people out with usage https://www.youtube.com/@VEBA-Multiomics

Depending on your system, you can use Docker as well (pushing the newest images for 2.1.0 to DockerHub at this moment).

I have a meeting w/ JGI's KBase next week to hopefully get this up there as well.

cmkobel commented 6 months ago

I might be able to help drafting a recipe but the concept of having many independent conda environments is not straight forward to implement in a single conda package. So either the pipeline should create these automatically upon installing the package, or the independent environments should be merged somehow. Or you could make one package for each computational module.

jolespin commented 6 months ago

The way the code is structured would require one environment per module-ish. The package is too massive to fit in one environment and you can't have a conda package that installs environments.

The installation script I made does all of this for you btw.

https://github.com/jolespin/veba/tree/main/install

It's just 3 commands:

bash install.sh

conda activate VEBA-database_env

bash download_databases.sh /path/to/database_directory/

All the other documentation is just around managing resources.

For conda, it would have to be something like this:

# Database download
conda create -n VEBA-database_env -c bioconda veba-database 
conda activate VEBA-database_env
download_databases.sh /path/to/database_directory/

# Usage
conda create -n VEBA-binning-prokaryotic_env -c bioconda veba-binning-prokaryotic
conda activate VEBA-binning-prokaryotic_env
binning-prokaryotic.py [params] --veba_database /path/to/database_directory/

or 
conda create -n VEBA-binning-prokaryotic_env -c bioconda veba-binning-prokaryotic
conda activate VEBA-binning-prokaryotic_env
update_environment_variables.sh /path/to/database_directory/

I'll look into this but will probably push it to my personal anaconda channel at first. I'll have to build custom conda recipes from the conda yaml files so it's not very straight forward. I understand that a tool being conda installable is a huge entry to barrier (for example, I only included packages that were available on conda or pip in VEBA and sometimes won't even try a tool out if it's not installable w/ conda). Problem w/ VEBA is that it's a modular software suite so have conda install -c bioconda veba is virtually impossible.

If you need help installing either watch this video tutorial or reach out. I'd be glad to help you out. In the video tutorial I describe how to install/download but I only do a custom installation in the example b/c I'm running it locally for the Docker tutorial next. I also have an end-to-end tutorial on there w/ a conda installation.