Closed kapsakcj closed 6 months ago
theiacov_fasta being run via cromwell is failing both in the GHActions CI as well as in my local devVM.
Failing on the pangolin task for sars-cov-2, which is unexpected since this has worked previously.
tests on terra were successful, not sure why it's failing in the CI but we need to investigate prior to merging
Will follow up after I've done more digging ๐
I traced the issue down to an issue with pangolin
when it is running scorpio
as part of it's own snakemake workflow. Here's the relevant error from the scorpio.log
, which is a temporary file that I could only view after re-running pangolin with the --notemp
option to keep all intermediate files and logs. After digging I found this, which basically says that the OS is having an issue with really long paths/filenames. I think it's the multiprocessing
python package that is complaining, specifically?
INFO: Found file /opt/conda/envs/pangolin/lib/python3.8/site-packages/constellations/definitions/cXE.json for constellation Omicron (XE-like) containing 6 defining mutations
INFO: Rules {'default': {'min_alt': 3, 'max_ref': 0}}
Process SyncManager-1:
Traceback (most recent call last):
File "/opt/conda/envs/pangolin/lib/python3.8/multiprocessing/process.py", line 315, in _bootstrap
self.run()
File "/opt/conda/envs/pangolin/lib/python3.8/multiprocessing/process.py", line 108, in run
self._target(*self._args, **self._kwargs)
File "/opt/conda/envs/pangolin/lib/python3.8/multiprocessing/managers.py", line 608, in _run_server
server = cls._Server(registry, address, authkey, serializer)
File "/opt/conda/envs/pangolin/lib/python3.8/multiprocessing/managers.py", line 154, in __init__
self.listener = Listener(address=address, backlog=16)
File "/opt/conda/envs/pangolin/lib/python3.8/multiprocessing/connection.py", line 448, in __init__
self._listener = SocketListener(address, family, backlog)
File "/opt/conda/envs/pangolin/lib/python3.8/multiprocessing/connection.py", line 591, in __init__
self._socket.bind(address)
OSError: AF_UNIX path too long
Traceback (most recent call last):
File "/opt/conda/envs/pangolin/bin/scorpio", line 8, in <module>
sys.exit(main())
File "/opt/conda/envs/pangolin/lib/python3.8/site-packages/scorpio/__main__.py", line 272, in main
args.func(args)
File "/opt/conda/envs/pangolin/lib/python3.8/site-packages/scorpio/subcommands/classify.py", line 7, in run
classify_constellations(options.input,
File "/opt/conda/envs/pangolin/lib/python3.8/site-packages/scorpio/scripts/type_constellations.py", line 763, in classify_constellations
manager = mp.Manager()
File "/opt/conda/envs/pangolin/lib/python3.8/multiprocessing/context.py", line 57, in Manager
m.start()
File "/opt/conda/envs/pangolin/lib/python3.8/multiprocessing/managers.py", line 583, in start
self._address = reader.recv()
File "/opt/conda/envs/pangolin/lib/python3.8/multiprocessing/connection.py", line 250, in recv
buf = self._recv_bytes()
File "/opt/conda/envs/pangolin/lib/python3.8/multiprocessing/connection.py", line 414, in _recv_bytes
buf = self._recv(4)
File "/opt/conda/envs/pangolin/lib/python3.8/multiprocessing/connection.py", line 383, in _recv
raise EOFError
EOFError
Although this is not the ideal solution, we can set up the CI to skip scorpio which should resolve this error, but the downside is that it doesn't completely reproduce the theiacov_fasta workflow defaults (really all theiacov workflows that call pangolin). It will allow the tests to pass which is good, but we need to ensure that pangolin
runs correctly/without error in Terra on sars-cov-2 samples where scorpio
is run by default.
Another option is to use edit the WDL task for pangolin
and always use the option --tempdir .
so that all temporary files are written to the PWD .
, instead of $TMP which often has long paths given by cromwell. For example, here's an automatically chosen directory for pangolin when --tempdir
option from pangolin
is NOT used:
resources: tmpdir=/cromwell-executions/theiacov_fasta/6060a451-4e8d-4ad1-8169-b76f5a3abbee/call-pangolin4/tmp.f21e9e7a`
I implemented my proposed solution 1 to get the CI to pass in my last commit: 59bdce8
EDIT: also needed the 2 subsequent commits to get CI to pass completely:
Can un-do this change if necessary.
But the longer-term solution 2 may be the better option, but I would like more thorough testing before bringing a change like that into main
Seems like others using cromwell
have experienced this issue in the past. Specifically with python tools that use the mutliprocessing
python package https://github.com/broadinstitute/cromwell/issues/3647
Option 2 could be implemented with these changes (see below). I was able to circumvent the "pangolin/scorpio/python multiprocessing PATH too long" error with these changes.
It allowed my local cromwell
testing on theiacov_fasta workflow to run successfully, without having to use the pangolin --skip-scorpio
option.
## NEW LINE BELOW ##
export TMPDIR=/tmp
pangolin "~{fasta}" \
~{'--analysis-mode ' + analysis_mode} \
~{'--min-length ' + min_length} \
~{'--max-ambig ' + max_ambig} \
~{true='--expanded-lineage' false='' expanded_lineage} \
~{true='--skip-scorpio' false='' skip_scorpio} \
~{true='--skip-designation-cache' false='' skip_designation_cache} \
--outfile "~{samplename}.pangolin_report.csv" \
--verbose \
--tempdir /tmp \ ## THIS LINE IS ALSO NEW
~{pangolin_arguments}
Given our discussions outside of GitHub, I'm going to approve the PR as is, with the changes reflected in the CI and not the pangolin task itself ๐๐ป
This PR closes #304
๐๏ธ This dev branch should be deleted after merging to main.
:brain: Aim, Context and Functionality
Came across this bug during 1.3.0 release validation testing. This impacts the TheiaCoV_FASTA_PHB workflow only.
Mpox samples were accidentally run through the
abricate
task which is an Influenza-specific task. Workflow succeeded, but running this task is inappropriate for non-Flu samples.The workflow code change is so that abricate is only run if user sets
organism
toflu
ANDflu_subtype
is not set by user or unknown:hammer_and_wrench: Impacted Workflows/Tasks & Changes Being Made
This will affect the behavior of the workflow(s) even if users donโt change any workflow inputs relative to the last version: Yes
Running this workflow on different occasions could result in different results, e.g. due to use of a live database, "latest" docker image, or stochastic data processing : No
:clipboard: Workflow/Task Step Changes
๐ Data Processing
Docker/software or software versions changed: N/A
Databases or database versions changed: N/A
Data processing/commands changed: N/A
File processing changed: N/A
Compute resources changed: N/A
โก๏ธ Inputs
N/A
โฌ ๏ธ Outputs
abricate task outputs will NOT be outputted for non-Flu samples (as desired)
:test_tube: Testing
Test Dataset
Using the PHB validation dataset for testing. Includes sars-cov-2, Flu, Mpox,
Commandline Testing with MiniWDL or Cromwell (optional)
Terra Testing
Suggested Scenarios for Reviewer to Test
Test TheiaCov_FASTA on Flu and non-Flu samples (sars-cov-2, Mpox, etc.)
Theiagen Version Release Testing (optional)
-Will changes require functional or validation testing (checking outputs etc) during the release? functional
:microscope: Final Developer Checklist
๐ฏ Reviewer Checklist
๐๏ธ Associated Documentation (to be completed by Theiagen developer)