researchapps / sherlock

repository for collaborating with sherlock users to create containers
MIT License
5 stars 1 forks source link

VEP not exposed #1

Closed vsoch closed 6 years ago

vsoch commented 6 years ago

From discussion in an email thread, moved here for documentation and preservation:

So it seems that pvacseq may work using:

singularity exec --bind $PWD:/scif/data/pvacseq --app pvacseq pvacseq.simg pvacseq -h

I will test that with real data ASAP.

I thought that similarly VEP should work with

singularity exec --bind $PWD:/scif/data/VEP --app VEP pvacseq.simg VEP
/.singularity.d/actions/exec: 9: exec: VEP: not found

or

singularity run --bind $PWD:/scif/data/VEP --app VEP pvacseq.simg
No Singularity runscript for contained app: VEP
vsoch commented 6 years ago

You should inspect the VEP app to look at the details. There is no runscript (other than environment) because I didn't write one.

 ./pvacseq.simg inspect VEP
{
    "VEP": {
        "appenv": [
            "PERL_MM_USE_DEFAULT=1",
            "export PERL_MM_USE_DEFAULT"
        ],
        "appinstall": [
            "    /bin/bash -c \"source activate /scif/apps/pvacseq\"",
            "    apt-get install -y cpanminus",
            "    export PERL_MM_USE_DEFAULT=1",
            "    cpan App::cpanminus",
            "    cpanm Archive::Zip",
            "    cpanm BioPerl",
            "    cpanm DBI",
            "    cpanm DBD::mysql",
            "    git clone https://github.com/Ensembl/ensembl-vep.git",
            "    cpanm --local-lib=~/perl5 local::lib && eval $(perl -I ~/perl5/lib/perl5/ -Mlocal::lib)",
            "    cd ensembl-vep",
            "    perl INSTALL.pl -a a --NO_TEST",
            "    cpanm Bio::Root::Version",
            "    perl INSTALL.pl -a p -g Downstream,miRNA",
            "    /bin/bash -c \"source deactivate\""
        ],
        "apprun": [
            "PERL_MM_USE_DEFAULT=1",
            "export PERL_MM_USE_DEFAULT",
            "PERL_MM_USE_DEFAULT=1",
            "export PERL_MM_USE_DEFAULT"
        ]
    }
}

What you should do is shell into the container, find the executable that you want to run (and test it) and give a go at writing the %apprun VEP section for it. Open a pull request to this repository and we will discuss!

but it does not (there is a runscript - /scif/apps/VEP/scif/runscript - but I guess the message is meant to say that there is no apprun section).

Yes, see the above.

Any reason why you did VEP and VEP_plugins separately?

It's a design choice. the VEP_plugins is just an install script.

./pvacseq.simg inspect VEP_plugins
{
    "VEP_plugins": {
        "appinstall": [
            "    /bin/bash -c \"source activate /scif/apps/pvacseq\"",
            "    cd ..",
            "    git clone https://github.com/Ensembl/VEP_plugins.git && cd VEP_plugins",
            "    pvacseq install_vep_plugin ${SCIF_APPROOT}  ",
            "    /bin/bash -c \"source deactivate\""
        ]
    }
}

If a user wanted VEP without the plugins, now they are done separately.

I think that VEP installation may be require modifications.

Agreed! Go for it. I don't know any of this software or the domain, so you are lead on this. If you've never collaborated on Github, you need to fork the repo, clone to your computer, checkout a new branch, make the change, commit and push to your remote, and then PR (pull request) the new feature to the master branch. Here you can read about the Github Flow if you need detail.

https://guides.github.com/introduction/flow/

So lets say I want to edit things, I see now: a docekr file, a scif file on github and a singularity file on singularity hub. How should I go about changing this? should I copy your git and modify it? then what? should I add the singularity file to it? then I should connect the repository to git?

You just read my mind, or I read yours? Haha. yes!! This is exactly what we can do. You can at this point just make changes to the Dockerfile and scif recipe, and don't worry about the Singularity file. Please read about the Github flow above to understand the "connection to git" - by way of being a repository (with a .git folder) it is already connected.

I read the documentation(s) carefully. I haven't seen any example of using scif as entry point to a docker, I thought that it is built in singularity. I understand that you did this for speed of development, but I need to understand how to modify that now.

You modify the scif via its recipe. The SCIF from the container that we built comes entirely from the Docker layers. You can just ignore the Singularity (singularity.lbl.gov) docs about SCIF and apps, we are using the client desribed here: https://sci-f.github.io/ The entire installation happens for both containers in the Dockerfile, like this:

# This installs the client
RUN pip install scif

# This adds the recipe file to the container
ADD pvacseq.scif /pvacseq.scif

# This installs everything, that's it! So edit this file.
RUN /opt/conda/bin/scif install /pvacseq.scif

It seems that singularity/scif is in its very early days, so I want to make a suggestion for a tutorial that will make it very easy to learn and appealing to people in comp. bio.

That would be great! I'm actively working on these things around the clock, at least for domains I'm familiar with. If you might be able to contribute a comp bio tutorial, I will be glad to help and publish on the examples site here --> https://sci-f.github.io/apps/blog/

Say my goal is this: I have a file with illumina reads. I want to map the reads using START to the human genome and then I want to filter the reads that map to chromosome 1 using a python 2 script and output this into a file (I imagine that both are done in the same python2 conda environment). Then, I want to to run a script in python 3 that goes over the file and outputs a table in a file that indicate how many reads mapped to 0-1Mb, 1-2Mb, ect. (using a different conda environment)

This is very possible, and requires someone with domain experience such as yourself!

and take that from the start to the end -

Of course this is just my made up example but it will be very helpful to have a full development flow tutorial (i.e. instructing how to write the Singularity file, how to upload, how to pull how to run, how to look at the files etc. ) much easier (for me) than connecting dots from the singularity hub, singularity tutorial, docker tutorials and such. Anyway - just an idea that I think will be helpful, feel free to ignore if you don't think this is required :)

I will continue to do my best. If you find more hours in the day, please send them over with snacks.