nextflow-io / nextflow

A DSL for data-driven computational pipelines
http://nextflow.io
Apache License 2.0
2.77k stars 630 forks source link

Error finding Python in HPC environment #5164

Closed jreimertz closed 3 months ago

jreimertz commented 3 months ago

Bug report

Nextflow does not appear to be able to find the correct Python3 in my HPC environment. I first discovered this when trying to run a Python script in my workflow using a shebang in the script I get the error:

Command error: /usr/bin/env: python3: No such file or directory

Upon looking further into this I was able to determine that Nextflow only seems to be able to find Python version 2.7 (which is not even installed in my HPC environment). This doesn't appear to be a PATH issue as searching for python3 with which python3 returns all of the locations on the PATH two of which include python3

Steps to reproduce this error

I've made a simple nf script to test this issue:

#!/usr/bin/env nextflow

params.test_py = "$projectDir/test_python.py"

println "test python script: $params.test_py"

process testPython {
    input:
    file testScript

    output:
    path 'py_out.txt'

    shell:
    """
    ./$testScript >> 'py_out.txt'
    """
}

workflow {
    testPython(file(params.test_py))
    }

Where test_python.py is:

#!/usr/bin/env python3

import sys

print('This is a Python script')
print(sys.version)

This outputs the following error:

ERROR ~ Error executing process > 'testPython'

Caused by: Process testPython terminated with an error exit status (126)

Command executed:

./test_python.py >> 'py_out.txt'

Command exit status: 126

Command output: (empty)

Command error: .command.sh: ./test_python.py: /usr/bin/env: python3: No such file or directory

Alternatively, if I change the shebang in test_python.py to /usr/bin/env python the script runs and the output looks like:

This is a Python script 2.7.5 (default, Jun 28 2022, 15:30:04) [GCC 4.8.5 20150623 (Red Hat 4.8.5-44)]

When testing to find python3 I get the following error:

ERROR ~ Error executing process > 'testPython'

Caused by: Process testPython terminated with an error exit status (1)

Command executed:

which python3

Command exit status: 1

Command output: (empty)

Command error: which: no python3 in (/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/projectDir/bin)

However if I check each of these locations on the PATH I can find python3 in both /usr/bin and /projectDir/bin

Compared to if I use which python which returns:

/usr/bin/python

Further confusing to me is that if within the nf script I use whereis python3 Nextflow can find it in /projectDir/bin

Environment

Additional context

I am using a HPC environment so I don't have full sudo permissions, and for that reason Nextflow is installed in a folder I have permissions to work with opposed to the HPC core bin directory. Also worth noting that I can run this script fine from the commandline with python3, so I believe this is a Nextflow error

bentsherman commented 3 months ago

Which executor are you using? If the jobs are running on a different node than where you launch Nextflow, they might have a different environment and not have access to all of the commands you can access from the submitter node. I would try to run which python3 in the same environment where the jobs are executed.

jreimertz commented 3 months ago

I'm using slurm. Currently I'm using a srun job to test the workflow, but for the full pipeline I would use sbatch to submit a shell script with the nextflow command. From the commandline if I run which python3 in the srun job I get:

/programs/x86_64-linux/system/biogrids_bin/python3

bentsherman commented 3 months ago

I would look at the .command.run script that is generated for the process in the work directory. It is the job script that is submitted by Nextflow, so you can try to submit it manually to see if it still fails, and inspect the script to see if it is doing something that conflicts with the job environment

jreimertz commented 3 months ago

Submitting .command.run also fails with the output in .command.log:

which: no python3 in (/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/projectDir/bin)

Looking through .command.run the only thing that seems like it could be affecting the environment is that I have an apptainer definition in my config file, but since I've been running Nextflow without the -with-apptainer flag my understanding was that this shouldn't be affecting the environment.

For reference this is the chunk I'm referring to:

nxf_launch() {
    set +u; env - PATH="$PATH" ${TMP:+APPTAINERENV_TMP="$TMP"} ${TMPDIR:+APPTAINERENV_TMPDIR="$TMPDIR"} ${NXF_TASK_WORKDIR:+APPTAINERENV_NXF_TASK_WORKDIR="$NXF_TASK_WORKDIR"} apptainer exec --no-home --pid -B /projectDir -B /temp_work/work/ef/dcd7a7bc04724cdaebeabacbe7ac1b /projectDir/cellranger_bcl2fastq_singularity_container_latest.sif /bin/bash -c "cd $NXF_TASK_WORKDIR; eval $(nxf_container_env); /bin/bash -ue /temp_work/work/ef/dcd7a7bc04724cdaebeabacbe7ac1b/.command.sh"
}
stevekm commented 3 months ago

are you sure this is not a version of python that exists inside of your container /projectDir/cellranger_bcl2fastq_singularity_container_latest.sif ?