ncbi / pgap

NCBI Prokaryotic Genome Annotation Pipeline
Other
310 stars 90 forks source link

Difficulty running PGAP with singularity #206

Closed stitam closed 2 years ago

stitam commented 2 years ago

After consulting the manual I still have difficulty running the pipeline. I would like to run PGAP on a system where I do not have sudo access but I can run a singularity container and have full access to a number of directories. I pulled the build6021 singularity image from docker hub and downloaded pgap.py from the link provided in the manual. After putting the two files in the same directory and running ./pgap.py --container-name pgap_2022-04-14.build6021.sif --container-path ./pgap_2022-04-14.build6021.sif --docker /usr/local/bin/singularity --version PGAP indicates The latest version of PGAP is 2022-04-14.build6021, you have nothing installed locally so I am unsure I managed to tell PGAP the image is already downloaded, what am I doing wrong? Also when I run the pipeline on the Mycoplasma genitalium genome provided with the installation it starts downloading the databases: is it possible to tell PGAP where to download the databases? I didn't find such option for pgap.py. Many thanks.

azat-badretdin commented 2 years ago

Thank you, Tamás, for your report!

I am a little bit confused on "consulting the manual" part. The most straightforward way to install PGAP is to go through Quick Start.

Granted it lists series of commands only for default, docker container runner, it seems that adding -D option to installation line should perform the pulling of the singularity image. Have you tried running ./pgap.py --update --docker /usr/local/bin/singularity?

The message you have nothing installed locally refers to presence of reference data signaled by the presence of VERSION file in the installation directory. The message is informational and precedes the actual installation of reference data. You will noticed it by rapid progress indicator of the downloading process.

Also when I run the pipeline on the Mycoplasma genitalium genome provided with the installation it starts downloading the databases: is it possible to tell PGAP where to download the databases?

I am not sure I understand why would you need to download the databases from somewhere else? Currently we support only one source of distribution of data.

I didn't find such option for pgap.py

./pgap.py --update

Please let me know if this helps, Tamás

tbazilegith commented 2 years ago

Hello Azat, Could you help me figure out what's wrong with this? I converted the docker image into singularity as my system only supports singularity. Also pgap is available on the system as a module. I ran this in slurm script

SBATCH --cpus-per-task=16

SBATCH --mem=160gb

module load pgap/20220414 pgap.py --cpu $SLURM_CPUS_ON_NODE -n -o pgap_testDir/test_output pgap_testDir/MG37_input/input.yaml --container-path /apps/staphb-toolkit/containers/pgap_2022-04-14.build6021.sif --docker singularity

My cwltool.log file looks like this

Original command: /apps/pgap/20220414/pgap/scripts/pgap.py --docker /apps/singularity/latest/singularity --container-path /apps/pgap/20220414/pgap/scripts/pgap_2022-04-14.build6021.sif --no-self-update --report-usage-false --cpu 16 -n -o pgap_testDir/test_output pgap_testDir/MG37_input/input.yaml --container-path /apps/staphb-toolkit/containers/pgap_2022-04-14.build6021.sif --docker singularity -bash: Docker: command not found Thanks!

azat-badretdin commented 2 years ago

Hello, Tassy, I noticed that output replay lists some duplication of command line arguments with different specs. Could you please put positional parameters in the end of the command line and try again like this?

./pgap.py --cpu $SLURM_CPUS_ON_NODE \
-n -o pgap_testDir/test_output \
--container-path /apps/staphb-toolkit/containers/pgap_2022-04-14.build6021.sif \
--docker singularity  \
pgap_testDir/MG37_input/input.yaml 
tbazilegith commented 2 years ago

Hello Azat, After I made the change, cwltool.log file looks like this Original command: /apps/pgap/20220414/pgap/scripts/pgap.py --docker /apps/singularity/latest/singularity --container-path /apps/pgap/20220414/pgap/scripts/pgap_2022-04-14.build6021.sif --no-self-update --report-usage-false --cpu 16 -n -o pgap_testDir/test_output --container-path /apps/staphb-toolkit/containers/pgap_2022-04-14.build6021.sif --docker singularity pgap_testDir/MG37_input/input.yaml

[2022-07-19 08:21:31] WARNING Final process status is permanentFail

Thanks! Tassy

azat-badretdin commented 2 years ago

Tassy, this sounds different from the problem described in this ticket, would you mind opening a new Issue?

Thanks!