usegalaxy-eu / infrastructure-playbook

Ansible playbook for managing UseGalaxy.eu infrastructure.
MIT License
16 stars 91 forks source link

Add conda scheduling tags for the tool psortb #1231

Closed sanjaysrikakulam closed 2 months ago

sanjaysrikakulam commented 2 months ago

@jennaj reported that the tool is not working, and there is also a discussion of the same in the Galaxy help.

This PR adds the scheduling tags conda and singularity as the tool is available in a conda env.

After activating the environment, I tested the version print, which seems "to work" (the tool is printing the usage instead of the version; I also ran the command psort directly, and even that is printing only the usage and not the version).

galaxy@sn06:~$ . '/usr/local/tools/_conda/bin/activate' '/usr/local/tools/_conda/envs/__psortb@_uv_'/

__psortb@_uv_) galaxy@sn06:~$ python /opt/galaxy/shed_tools/toolshed.g2.bx.psu.edu/repos/peterjc/tmhmm_and_signalp/e1996f0f4e85/tmhmm_and_signalp/tools/protein_analysis/psortb.py --version
Usage: _psort [-p|-n] [OPTIONS] [SEQFILE]
Runs _psort on the sequence file SEQFILE .  If SEQFILE isn't provided
then sequences will be read from STDIN.
  --help, -h        Displays usage information
  --positive, -p    Gram positive bacteria
  --negative, -n    Gram negative bacteria
  --archaea, -a     Archaea
  --cutoff, -c      Sets a cutoff value for reported results
  --divergent, -d   Sets a cutoff value for the multiple
                    localization flag
  --matrix, -m      Specifies the path to the pftools instalation.  If
                    not set, defaults to the value of the PSORT_PFTOOLS
                    environment variable.
  --format, -f      Specifies sequence format (default is FASTA)
  --exact, -e       Skip SCLBLASTe (useful for batch runs of data
                    against itself in SCLBLAST)
  --output, -o      Specifies the format for the output (default is
                    'normal'  Value can be one of: terse, long or normal
  --root, -r        Specify PSORT_ROOT for running local copies.  If
                    not set, defaults to the value of the PSORT_ROOT
                    environment variable.
  --server, -s      Specifies the PSort server to use
  --verbose, -v     Be verbose while running
  --x-skip-localization
  --version         Print the version of PSortb
mvdbeek commented 2 months ago

Is there a reason the container can't be fixed ?

sanjaysrikakulam commented 2 months ago

Is there a reason the container can't be fixed?

I thought the problem would disappear when it was added to the destination using the singularity container. I thought the conda env would be mounted to the container or the tool_script.sh would activate the correct environment and run the tool. However, it's not the case. I see the below in the tool_script.sh, and in the destination, we have a base singularity container for the tools to use.

# Check if container was created by installing conda packages,
# and if so, source scripts to populate environment variables
# that would be set by activating the conda environment.
if [ -d /usr/local/etc/conda/activate.d ]; then
  export CONDA_PREFIX=/usr/local
  for f in /usr/local/etc/conda/activate.d/*.sh; do
    case "$f" in
      "/usr/local/etc/conda/activate.d/activate-"*) :;;
      *) . "$f" ;;
    esac;
  done
fi
python /opt/galaxy/shed_tools/toolshed.g2.bx.psu.edu/repos/peterjc/tmhmm_and_signalp/e1996f0f4e85/tmhmm_and_signalp/tools/protein_analysis/psortb.py --version > /data/jwd02f/main/070/748/70748467/outputs/COMMAND_VERSION 2>&1;
python /opt/galaxy/shed_tools/toolshed.g2.bx.psu.edu/repos/peterjc/tmhmm_and_signalp/e1996f0f4e85/tmhmm_and_signalp/tools/protein_analysis/psortb.py "$GALAXY_SLOTS" '-p' 'terse' '' '' '/data/dnb10/galaxy_db/files/c/5/5/dataset_c550a5aa-ccb9-4859-a9b1-8e56c873ddf3.dat' '/data/jwd02f/main/070/748/70748467/outputs/dataset_cad8a303-fc05-413a-aa7d-9856af9b5574.dat'

I am missing something. Galaxy should automatically resolve, right?

mvdbeek commented 2 months ago

Where did you find psort ? The only conda-related trace I found is https://github.com/bioconda/bioconda-recipes/pull/5901 which was never merged ?

sanjaysrikakulam commented 2 months ago

I found the Conda environment in /usr/local/tools/_conda/envs/__psortb@_uv_. I thought it was created while installing the tool. All other Conda environments for various other tools are created here: /usr/local/tools/_conda/envs/.

sanjaysrikakulam commented 2 months ago

I think I will 1st test this psortb container https://hub.docker.com/r/brinkmanlab/psortb_commandline/ and then if psortb works I will update the TPV and the tool to pull this container and use it.

sanjaysrikakulam commented 2 months ago

The container works. However, this involves changes in the tool wrapper and other things. The container comes with a wrapper to run the tool as mentioned in the readme (we need not to use that, I guess).

I did not use the wrapper to test the tool. I did the following

root@vgcnbwc-worker-c64m2-0000:~$ docker pull brinkmanlab/psortb_commandline:1.0.2

root@vgcnbwc-worker-c64m2-0000:~$ docker run --rm -it 779f /usr/local/psortb/bin/psort --help
Usage: psort [-a|-n|-p] [OPTIONS] <sequence file>
Runs psort on the sequence file provided with the -i option.
  --help, -h        Displays usage information
  --positive, -p    Gram positive bacteria
  --negative, -n    Gram negative bacteria
  --archaea, -a     Archaea
  --cutoff, -c      Sets a cutoff value for reported results
  --divergent, -d   Sets a cutoff value for the multiple
                    localization flag
  --format, -f      Specifies sequence format (default is FASTA)
  --exact, -e       Skip SCLBLASTe (useful for batch runs of data
                    against itself in SCLBLAST)
  --output, -o      Specifies the format for the output (default is
                    'normal')  Value can be one of: normal, terse or long
  --x-skip-localization
  --verbose, -v     Be verbose while running
  --version         Print the version of PSortb

root@vgcnbwc-worker-c64m2-0000:~$ docker run --rm -it 779f /usr/local/psortb/bin/psort --version
PSORTb version 3.0

root@vgcnbwc-worker-c64m2-0000:~$ docker run --rm -it -v /tmp/psortb_results:/tmp/psortb_results:rw -v /data/dnb10/galaxy_db/files:/data/dnb10/galaxy_db/files:ro -e MOUNT=/tmp/psortb_results 779f /usr/local/psortb/bin/psort -p -o terse -i /data/dnb10/galaxy_db/files/8/4/8/dataset_84805dd6-ab96-469d-bbc2-c2dcbf495022.dat
Saving results to /tmp/psortb_results/20240612111110_psortb_grampos.txt

root@vgcnbwc-worker-c64m2-0000:~$ ll /tmp/psortb_results/
total 0

However, the tool does not use the MOUNT ENV as given in the wrapper to store the output files. The tool stores the output to its default location in /tmp/results/ inside the container. To test this I did the below.

root@vgcnbwc-worker-c64m2-0000:~$ docker run --rm -it --entrypoint="/bin/bash" -v /tmp/psortb_results:/tmp/psortb_results:rw -v /data/dnb10/galaxy_db/files:/data/dnb10/galaxy_db/files:ro -e MOUNT=/tmp/psortb_results 779f
root@2c9538bcf546:/usr/local/src# /usr/local/psortb/bin/psort -p -o terse -i /data/dnb10/galaxy_db/files/8/4/8/dataset_84805dd6-ab96-469d-bbc2-c2dcbf495022.dat
Saving results to /tmp/psortb_results/20240612111614_psortb_grampos.txt

root@2c9538bcf546:/usr/local/src# ls -l /tmp/psortb_results/
total 0

root@2c9538bcf546:/usr/local/src# ls -l /tmp/results/
total 4
-rw-r--r--. 1 root root 1292 Jun 12 11:16 20240612111614_psortb_grampos.txt

I will put all these in an issue and I guess someone should update the tool XML file and release a new version. I am not sure whether the galaxy wrappers are still maintained by the tool authors.

sanjaysrikakulam commented 2 months ago

Here is the issue where I put all the above: https://github.com/usegalaxy-eu/issues/issues/565