jiarong / VirSorter2

customizable pipeline to identify viral sequences from (meta)genomic data
GNU General Public License v2.0
221 stars 31 forks source link

Running the VirSorter2 setup in an environment with no internet #40

Open BikramDroid opened 3 years ago

BikramDroid commented 3 years ago

Background: Able to run VirSorter2 perfectly on a local computer with the internet.

Now need the same setup on a computer that has no internet connection due to some privacy reasons that the .fasta files on which this setup would run and perform results, are not available outside that computer.

So this setup in this case needs to be moved to that computer with no internet connection.


Steps performed: The local computer setup of VirSorter2 (with all databases set up, and working perfectly) was compressed into a single file, then taken to that another computer, and uncompressed there. Now the complete setup is available there, but what would be the steps to run it? I have Conda installed on this computer, where can I activate the environment and start using virsorter command?

Asking this because in local we created environment first using below command.

conda create -n vs2 -c bioconda virsorter=2
conda activate vs2

and then activated it by which we could use virsorter command.

jiarong commented 3 years ago

Hi, if there is no internet connection, conda is not a good option for installing dependencies. You can look into docker or singularity on building portable "containers".

BikramDroid commented 3 years ago

The question is even if I use docker to bind them in a container, how can I activate the conda environment and start using virsorter command?

Another point do you know about conda packing/unpacking would it be helpful here?

jiarong commented 3 years ago

When you use docker, you can install at the root environment (conda install -c bioconda virsorter=2 instead of conda create), so not activation is necessary.

With conda (and without internet), you would need to packing/unpacking 10s of dependencies manually...

BikramDroid commented 3 years ago

Hi. Sorry for late reply but I'm not getting much luck with singularity.

Can you tell me how to proceed with conda packing of dependencies, what packages there would be?

jiarong commented 3 years ago

Again this is a bad idea and I have no idea what problem you might encounter.

A list of dependencies are listed here: https://github.com/jiarong/VirSorter2/blob/master/virsorter/envs/vs2.yaml

BikramDroid commented 3 years ago

Hi, I was able to take my setup to the computer with no internet. All setup is done, DB everything but when I'm trying to start executing I'm getting CreateCondaEnvironmentException

The reason I think is during execution it is referring to yaml file under DB folder and creating a conda environment, which requires internet and in this case it's not possible.

Screenshot 2021-03-14 at 18 22 44

Is that the reason? and this means it won't be possible to use virsorter2 in this case?? is there any workaround?

jiarong commented 3 years ago

You copied the DB set up in your own computer to the server? If so, conda installations can not copied between computers. It simply does not work in this way..

BikramDroid commented 3 years ago

yes I copied DB setup so it seems singularity is the only option now, I will check on that once and let you know as earlier I had some issues on same. also conda-pack/unpack won't be a solution here?? that's what I use for conda installations between computer, pack from one and unpack on other! But here i can see DB installation might be via some internal scripts to virsorter so it's better not to change that, anyways i will check singularity approach once and confirm.

jiarong commented 3 years ago

I actually did NOT know about conda pack/unpack.. (I thought it was simply tar and untar). Right, VirSorter2 installed dependencies with the setup subcommand along with downloading DB.

BikramDroid commented 3 years ago

Hi again,

Seems this conda unpack way of moving DB stuff didn't work, there is a random number.yaml file generated every time, strange, I unpacked then set DB path and tried running but same error.

Anyways I was able to create a singularity definition file using below recipe https://github.com/jiarong/VirSorter2/blob/master/Singularity

I am able to access virsorter command after creating an image from this recipe but the problem is with DB again. I was able to access the complete setup on the server but without DB. Was able to run locally with no issues. Can you tell me after the image file is created using the above recipe, how can I copy DB to that image file and use it? Any help appreciated.

used below command for creating image, how can i add db folder after it is created? sudo singularity build virsorter.sif virsort.def

I just need a complete setup that will take fasta file and do all the stuff.

jiarong commented 3 years ago

That recipe is not for your situation (no internet). You need to add the DB setup within the receipt since this part needs internet connection.

BikramDroid commented 3 years ago

Hi,

I tweaked the recipe a bit and added the files folder in between runscript and post sections.

%runscript
    exec virsorter "$@"

%files
    /home/bikram/Desktop/VirSorter2/db  /home/bikram/database  

%post

I can see folder was copied while creating image from this recipe file. I also added db configuration in post section.

virsorter config --init-source --db-dir=/home/bikram/database/db

But while running this file, using below command

sudo singularity exec virlatest.sif virsorter run -w test.out -i test.fa --provirus-off --max-orf-per-seq 20 al

I'm getting below error

FileNotFoundError: [Errno 2] No such file or directory: '/home/bikram/database/db/group' I tried doing cd inside this folder while in shell of this singularity file and folder did exist, both db and group, but still above error coming, do you know what could be the issue here??


One more thing to bring to your attention is while installing virsorter from development version, its taking forever to install

conda install -y -c conda-forge -c bioconda "python>=3.6" scikit-learn=0.22.1 imbalanced-learn pandas seaborn hmmer==3.3 prodigal screed ruamel.yaml "snakemake>=5.18,<=5.26" click mamba
    git clone https://github.com/jiarong/VirSorter2.git
    cd VirSorter2
    pip install .

Issue

miniconda3/bin/conda install -y -c conda-forge -c bioconda python>=3.6 scikit-learn=0.22.1 imbalanced-learn pandas seaborn hmmer==3.3 prodigal screed ruamel.yaml snakemake>=5.18,<=5.26 click mamba
Collecting package metadata (current_repodata.json): done
Solving environment: failed with initial frozen solve. Retrying with flexible solve.
Collecting package metadata (repodata.json): done
Solving environment: done

Above in Solving environment: failed with initial frozen solve. Retrying with flexible solve. section, it take lot of time!!

jiarong commented 3 years ago

Hi, I am not experience with containers, so I could be wrong. I do not know why the %file way does not work. I was suggesting running virsorter setup -j 4 -d /home/bikram/database/db right after installing VirSorter2. This way you do not need to copying between outside and inside the container.