ncbi / pgap

NCBI Prokaryotic Genome Annotation Pipeline
Other
303 stars 90 forks source link

Feature request: support Singularity #14

Closed TheBigFatTony closed 4 years ago

TheBigFatTony commented 5 years ago

Description:

Because Docker requires root privileges, it cannot be used on clusters without compromising user management. For this reason, I suggest you add the option to run the pipeline using Singularity, which can run with regular privileges. I suspect it wouldn't be too hard...

How I got it to work:

I already tried it and it seems to work, although not yet on our cluster. In case anyone's interested, I'm posting my solution below:

# set a working directory with lots of available space
CWD=/xxx/xxx/xxx/PGAP

# set current version
DATE=2019-05-13
BUILD=build3740

# set the singularity cache location, meaning the location where singularity images are stored.
# this step is convenient if you don't want the image be stored in your home folder.
export SINGULARITY_CACHEDIR=$CWD/singularity_cache
mkdir $SINGULARITY_CACHEDIR

# download supplemental files
wget https://s3.amazonaws.com/pgap/input-$DATE.$BUILD.tgz
wget https://s3.amazonaws.com/pgap-data/test_genomes-$DATE.$BUILD.tgz

# extract supplemental files, delete tars
tar xzvf input-$DATE.$BUILD.tgz && rm input-$DATE.$BUILD.tgz
tar xzvf test_genomes-$DATE.$BUILD.tgz && rm test_genomes-$DATE.$BUILD.tgz

# get the pgap_input.yaml for the test genome
wget https://campuscloud.unibe.ch:443/ssf/s/readFile/share/29641/5440512207672247602/publicLink/pgap_input.yaml
mv pgap_input.yaml test_genomes/MG37/

# save the PGAP version to file (don't know if this is necessary...)
echo "$DATE.$BUILD" > VERSION

# download the docker container, convert it to singularity, and see if it works
# when the container is first run, it takes some time to download the container and convert it to singularity. if you run the same command a second time, it will be very quick.
mkdir $CWD/mg37_results
singularity run --pwd /pgap \
--bind $CWD/input-$DATE.$BUILD:/pgap/input:ro \
--bind $CWD/test_genomes/MG37:/pgap/user_input \
--bind $CWD/test_genomes/MG37/pgap_input.yaml:/pgap/user_input/pgap_input.yaml:ro \
--bind $CWD/mg37_results:/pgap/output:rw \
docker://ncbi/pgap:$DATE.$BUILD

# if this works, you should be inside the container and the command line should look like this:
# "sh-4.2$"
# you can play around, see if the folders were successfully mounted and finally exit the container.
exit

# run PGAP on the test genome
mkdir $CWD/mg37_results
singularity exec \
--bind $CWD/input-$DATE.$BUILD:/pgap/input:ro \
--bind $CWD/test_genomes/MG37:/pgap/user_input \
--bind $CWD/test_genomes/MG37/pgap_input.yaml:/pgap/user_input/pgap_input.yaml:ro \
--bind $CWD/mg37_results:/pgap/output:rw \
--pwd /pgap \
docker://ncbi/pgap:$DATE.$BUILD \
cwltool --timestamps --outdir /pgap/output pgap.cwl /pgap/user_input/pgap_input.yaml

Additional info:

These are the systems I tried it on:

Hardware: Laptop

Hardware: Cluster

nclaesnacw commented 4 years ago

Second that, thanks Alan_Ward

tseemann commented 4 years ago

If you use Centos or RHEL, the current version 8 supports podman natively, which can run docker images rootless and daemonless. See https://podman.io/

P.S. we still want Singularity too :)

TheBigFatTony commented 4 years ago

If you use Centos or RHEL, the current version 8 supports podman natively, which can run docker images rootless and daemonless. See https://podman.io/

P.S. we still want Singularity too :)

basically any system with CGroupsV2 enabled, correct?

cjfields commented 4 years ago

@TheBigFatTony Just a quick note, I used your example above and got this to complete the MG annotation on our local cluster using SLURM. Our worker nodes are disconnected from outside network so I used a slight modification:

# set a working directory with lots of available space
CWD=$PWD

# set current version
DATE=2019-05-13
BUILD=build3740

# set the singularity cache location, meaning the location where singularity images are stored.
# this step is convenient if you don't want the image be stored in your home folder.

# export SINGULARITY_CACHEDIR=$CWD/singularity_cache
# mkdir $SINGULARITY_CACHEDIR

# download supplemental files
wget https://s3.amazonaws.com/pgap/input-$DATE.$BUILD.tgz
wget https://s3.amazonaws.com/pgap-data/test_genomes-$DATE.$BUILD.tgz

# extract supplemental files, delete tars
tar xzvf input-$DATE.$BUILD.tgz # && rm input-$DATE.$BUILD.tgz
tar xzvf test_genomes-$DATE.$BUILD.tgz # && rm test_genomes-$DATE.$BUILD.tgz

# get the pgap_input.yaml for the test genome
wget https://campuscloud.unibe.ch:443/ssf/s/readFile/share/29641/5440512207672247602/publicLink/pgap_input.yaml
mv pgap_input.yaml test_genomes/MG37/

# save the PGAP version to file (don't know if this is necessary...)
echo "$DATE.$BUILD" > VERSION

# download the docker container, convert it to singularity, and see if it works
# when the container is first run, it takes some time to download the container and convert it to singularity. if you run the same command a second time, it will be very quick.
mkdir $CWD/mg37_results

# This caches the image and sets it up locally
singularity pull docker://ncbi/pgap:$DATE.$BUILD

### BELOW HERE RUNS ON WORKER NODE

# Note here we use the image directly when on the worker node
singularity run --pwd /pgap \
--bind $CWD/input-$DATE.$BUILD:/pgap/input:ro \
--bind $CWD/test_genomes/MG37:/pgap/user_input \
--bind $CWD/test_genomes/MG37/pgap_input.yaml:/pgap/user_input/pgap_input.yaml:ro \
--bind $CWD/mg37_results:/pgap/output:rw \
pgap_$DATE.$BUILD.sif

# if this works, you should be inside the container and the command line should look like this:
# "sh-4.2$"
# you can play around, see if the folders were successfully mounted and finally exit the container.
exit

# run PGAP on the test genome
# Note here we use the image directly
mkdir $CWD/mg37_results
singularity exec \
--bind $CWD/input-$DATE.$BUILD:/pgap/input:ro \
--bind $CWD/test_genomes/MG37:/pgap/user_input \
--bind $CWD/test_genomes/MG37/pgap_input.yaml:/pgap/user_input/pgap_input.yaml:ro \
--bind $CWD/mg37_results:/pgap/output:rw \
--pwd /pgap \
pgap_$DATE.$BUILD.sif \
cwltool --timestamps --outdir /pgap/output pgap.cwl /pgap/user_input/pgap_input.yaml

Hardware:

I haven't extensively tested memory usage or time as of yet, run time was less than one. I also noticed this does not work with the latest release.

ponomarevsy commented 4 years ago

That was super useful, @TheBigFatTony and @cjfields. Thank you!

slottad commented 4 years ago

The latest release on Friday also supports Singularity and Podman as container platforms without modification. You merely need to supply the executable path using the --docker or -D option. There are other cool features too, please check it out.

max1c commented 4 years ago

@slottad Could you please elaborate on what you mean by the --docker or -D option? As far as I can tell singularity does not support such an option and neither does cwltool. Are you talking about the cwltool --singularity option? What am I missing?

slottad commented 4 years ago

To use the option, supply the full path to the binary:

-D /usr/bin/singularity

pgap.py will auto detect the program, and do the right thing.

azat-badretdin commented 4 years ago

pgap.py --docker /usr/bin/singularity

vappiah commented 3 years ago

Hi All,

I tried pgap with the -D option ./pgap.py -D /opt/apps/singularity/3.6.4/bin/singularity -v -o pgapdir pgap-2019-02-11.build3477/MG37/input.yaml

And I got this error message below. Please advice pgap.py: error: unrecognized arguments: pgap-2019-02-11.build3477/MG37/input.yaml

azat-badretdin commented 3 years ago

You mangled together two separate things here:

pgap-2019-02-11.build3477/MG37/input.yaml

pgap-2019-02-11.build3477 - version and input YAML file.