ncbi / pgap

NCBI Prokaryotic Genome Annotation Pipeline
Other
316 stars 88 forks source link

<title>PGAP fails to run on Singularity #326

Closed letseatebil closed 2 weeks ago

letseatebil commented 3 weeks ago

I am trying to do a test run on the test genomes provided on PGAP which i have installed (Based on the quick start on my HPC cluster but it gave the error

Traceback (most recent call last):
  File "/opt/python-3.9/lib/python3.9/urllib/request.py", line 1346, in do_open
    h.request(req.get_method(), req.selector, req.data, headers,
  File "/opt/python-3.9/lib/python3.9/http/client.py", line 1285, in request
    self._send_request(method, url, body, headers, encode_chunked)
  File "/opt/python-3.9/lib/python3.9/http/client.py", line 1331, in _send_request
    self.endheaders(body, encode_chunked=encode_chunked)
  File "/opt/python-3.9/lib/python3.9/http/client.py", line 1280, in endheaders
    self._send_output(message_body, encode_chunked=encode_chunked)
  File "/opt/python-3.9/lib/python3.9/http/client.py", line 1040, in _send_output
    self.send(msg)
  File "/opt/python-3.9/lib/python3.9/http/client.py", line 980, in send
    self.connect()
  File "/opt/python-3.9/lib/python3.9/http/client.py", line 1447, in connect
    super().connect()
  File "/opt/python-3.9/lib/python3.9/http/client.py", line 946, in connect
    self.sock = self._create_connection(
  File "/opt/python-3.9/lib/python3.9/socket.py", line 844, in create_connection
    raise err
  File "/opt/python-3.9/lib/python3.9/socket.py", line 832, in create_connection
    sock.connect(sa)
TimeoutError: [Errno 110] Connection timed out

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/hpcfs/users/a1809437/PGAP/pgap.py", line 1123, in main
    params = Setup(args)
  File "/hpcfs/users/a1809437/PGAP/pgap.py", line 604, in __init__
    self.remote_versions = self.get_remote_versions()
  File "/hpcfs/users/a1809437/PGAP/pgap.py", line 674, in get_remote_versions
    response = urlopen('https://api.github.com/repos/ncbi/pgap/releases/latest')
  File "/opt/python-3.9/lib/python3.9/urllib/request.py", line 214, in urlopen
    return opener.open(url, data, timeout)
  File "/opt/python-3.9/lib/python3.9/urllib/request.py", line 517, in open
    response = self._open(req, data)
  File "/opt/python-3.9/lib/python3.9/urllib/request.py", line 534, in _open
    result = self._call_chain(self.handle_open, protocol, protocol +
  File "/opt/python-3.9/lib/python3.9/urllib/request.py", line 494, in _call_chain
    result = func(*args)
  File "/opt/python-3.9/lib/python3.9/urllib/request.py", line 1389, in https_open
    return self.do_open(http.client.HTTPSConnection, req,
  File "/opt/python-3.9/lib/python3.9/urllib/request.py", line 1349, in do_open
    raise URLError(err)
urllib.error.URLError: <urlopen error [Errno 110] Connection timed out>
Output will be placed in: /hpcfs/users/a1809437/PGAP/mg37_results

This is due to the lack of internet access when executing as a job script so I tried adding the --no-internet flag but it gave another error:

Output will be placed in: /hpcfs/users/a1809437/PGAP/mg37_results
--no-internet flag enabled, not checking remote versions.
Docker not found.

I then tried to include the flag --docker singularity to force it to run on singularity but received the same error indicating Docker not found.

PGAP version is 2024-07-18.build7555 Singularity version is 3.10.5 Python version is 3.11.3

The command that is used to execute pgap within a job script is:

singularity exec /hpcfs/users/a1809437/PGAP/pgap_2024-07-18.build7555.sif \
./pgap.py -r -o mg37_results --container-path /hpcfs/users/a1809437/PGAP/pgap_2024-07-18.build7555.sif \
--no-internet --docker singularity \
test_genomes/MG37/input.yaml

Is there a way to force the execution on Singularity since my cluster does not provide a docker module or am I missing something? Thanks!

azat-badretdin commented 3 weeks ago

Thank you for your report, use @letseatebil !

This is incorrect:

The command that is used to execute pgap within a job script is:

singularity exec /hpcfs/users/a1809437/PGAP/pgap_2024-07-18.build7555.sif \
./pgap.py -r -o mg37_results --container-path /hpcfs/users/a1809437/PGAP/pgap_2024-07-18.build7555.sif \
--no-internet --docker singularity \
test_genomes/MG37/input.yaml

do not call it from inside singularity, call it directly, like this:

./pgap.py -r -o mg37_results --container-path /hpcfs/users/a1809437/PGAP/pgap_2024-07-18.build7555.sif \
--no-internet --docker singularity \
test_genomes/MG37/input.yaml
letseatebil commented 2 weeks ago

Thanks for the prompt response!

I made the necessary amendments but I am getting this error instead

/scratchdata1/users/a1809437/PGAP/./pgap.py:133: SyntaxWarning: invalid escape sequence '\['
  r = "^\[(?P<time>[^\]]+)\] (?P<level>[^ ]+) \[(?P<source>[^ ]*) (?P<name>[^\]]*)\] (?P<status>.*)"
/scratchdata1/users/a1809437/PGAP/./pgap.py:978: SyntaxWarning: invalid escape sequence '\-'
  if not re.match("^[a-zA-Z0-9_\-]+$", prefix):
/scratchdata1/users/a1809437/PGAP/./pgap.py:1109: SyntaxWarning: invalid escape sequence '\-'
  parser.error("Invalid Command Line Argument Error: Both arguments -s\--organism and -g\--genome must be provided if no YAML file is provided.")
Output will be placed in: /scratchdata1/users/a1809437/PGAP/mg37_results
--no-internet flag enabled, not checking remote versions.
<urlopen error [Errno 110] Connection timed out>
Failed to update ./pgap.py, ignoring
Something has gone wrong, please manually download: https://github.com/ncbi/pgap/raw/prod/scripts/pgap.py

I checked the mg37_results folder but its empty. I proceeded to run ./pgap.py --update (outside of the job script, running within the job script leads to a TimeoutError) again to ensure I am up to date.

PGAP version 2024-07-18.build7555 is up to date.
Docker not found.

I am not sure if redownloading pgap.py would solve the issue since I have done so 3 times and am absolutely stumped.

azat-badretdin commented 2 weeks ago

Could you please post the output of

python3 --version
letseatebil commented 2 weeks ago

This is the latest python module that is available to me on my cluster

[a1809437@p2-log-1 PGAP]$ python3 --version
Python 3.11.3
azat-badretdin commented 2 weeks ago

OK. We can ignore SyntaxWarning lines for now - they are just warnings, albeit annoying (that's why we already fixed them and they will be gone in next release)

bq. PGAP version 2024-07-18.build7555 is up to date.

You can manually verify if the script lies. If you do not have PGAP_INPUT_DIR envar set, your PGAP installations should be under $HOME/.pgap or if you have this envar set then under directory pointed to by that envar.

One of the "resets" that can be made is cleaning up that directory and trying installing again, this time with singularity parameter specified:

./pgap.py --update --docker singularity
letseatebil commented 2 weeks ago

Thanks for your help! I removed and cleaned up the directory for reinstallation and included the singularity parameter and the issue is fixed now :). Thank so much for your help once again!

azat-badretdin commented 2 weeks ago

Glad your problems were resolved! You are welcome, user @letseatebil !