Will fill this out later, just want to get the CI running to test these changes
This PR closes #324
ποΈ This dev branch should be deleted after merging to main.
:brain: Aim, Context and Functionality
This PR is spun out of the CI failures we were seeing with cromwell test workflows over in PR #305 just prior to the v1.3.0 release.
TL;DR is that Pangolin, and in particular scorpio which runs as part of pangolin was being fed super-long paths from cromwell. The python multiprocessing package which scorpio uses struggles with these long paths, leading to failures that we had not seen before.
So this PR serves 2 purposes:
Update the Pangolin task to use TMPDIR=/tmp which leads to shorter paths, and force the usage of pangolin --tempdir /tmp option so that temporary file paths are shorter.
Update our CI to account for the above changes AND also upgrade other CI components for general improvement (better alignment with Cromwell version that runs on Terra on GCP, squash warnings about node.js, update cromwell CI output file checks)
FOR THE FUTURE: we may want to switch the CI, specifically the CI that tests workflows via running cromwell, to run inside of a docker container instead of using a conda environment. This way we can stay in closer alignment with the version of Cromwell that is used on Terra on GCP. The published github releases lag behind the versions of cromwell used in Terra on GCP. So we could in theory run those workflows inside of this docker image: broadinstitute/cromwell:latesthttps://hub.docker.com/r/broadinstitute/cromwell/tags as it will mirror the version currently being used in Terra on GCP.
Not ideal to have a key dependency to the CI environment like this change unexpectedly, but if we want to reproduce/replicate cromwell behavior on Terra on GCP, it would be good for us to switch to using docker for running cromwell
:hammer_and_wrench: Impacted Workflows/Tasks & Changes Being Made
Changes being made
CI stuff
pin to Cromwell v86. While this does not exactly mirror the version used on Terra on GCP, it is closer than the previous version (83) that was used previously in the CI env.
updated from actions/checkout@v3 to actions/checkout@v4 (node.js warning)
updated from dorny/paths-filter@v2 to dorny/paths-filter@v3 (node.js warning)
updated from actions/setup-python@v4 to actions/setup-python@v5 (node.js warning)
updated from actions/upload-artifact@v4 to actions/upload-artifact@v5`
revert TheiaCoV_FASTA CI tests to no longer skip scorpio. We added this prior to PHB release v1.3.0 but it was a stopgap solution. This brings the CI workflow behavior in better alignment with typical TheiaCoV_FASTA wf usage (and pangolin usage)
change cromwell checks so that the presence of file log.err is checked and no longer its contents (because it's now empty with new version of cromwell)
updated md5sums where required for pangolin and task_versioning changes
Impacted workflows
all workflows that call the pangolin task:
pangolin_update
theiacov_fasta
theiacov_ont
theiacov_illumina_pe
theiacov_illumina_se
theiacov_clearlabs
all workflows that call the versioning task
There are many, so testing one or more of the above workflows will be sufficient for testing the update PHB_version output string.
This will affect the behavior of the workflow(s) even if users donβt change any workflow inputs relative to the last version: No
Running this workflow on different occasions could result in different results, e.g. due to use of a live database, "latest" docker image, or stochastic data processing: No
:clipboard: Workflow/Task Step Changes
π Data Processing
Docker/software or software versions changed: No
Databases or database versions changed: No
Data processing/commands changed: Yes - temorary directory used by pangolin is now hardcoded to /tmp. Do not expect results to change, just the destination for reading/writing intermediate files used by pangolin.
File processing changed: No
Compute resources changed: No
β‘οΈ Inputs
N/A
β¬ οΈ Outputs
phb_version (output string of versioning task) now states PHB v1.3.0-main so that user on the main branch know that they are using main but the version of main downstream of v1.3.0 release. This change was also made on purpose to trigger TheiaProk CI workflows.
:test_tube: Testing
Test Dataset
Will test with SARS-CoV-2 samples on various TheiaCov workflows.
Commandline Testing with MiniWDL or Cromwell (optional)
did not test locally, mainly want to see how these workflow changes perform in Terra and in the CI workflows
Terra Testing
Will update with tests soon.
Suggested Scenarios for Reviewer to Test
Test any TheiaCov workflows with sars-cov-2 samples as it will run the pangolin task.
Theiagen Version Release Testing (optional)
Will changes require functional or validation testing (checking outputs etc) during the release? functional for TheiaCov wfs. Also functional for all workflows that use the versioning task (so almost all of them)
Do new samples need to be added to validation datasets? If so, upload these to the appropriate validation workspace Google bucket (). Please describe the new samples here and why these have been chosen. No
Are there any output files that should be checked after running the version release testing? No
:microscope: Final Developer Checklist
[x] The workflow/task has been tested locally and results, including file contents, are as anticipated
[x] The workflow/task has been tested on Terra and results, including file contents, are as anticipated
[x] The CI/CD has been adjusted and tests are passing (to be completed by Theiagen developer)
Will fill this out later, just want to get the CI running to test these changesThis PR closes #324
ποΈ This dev branch should be deleted after merging to main.
:brain: Aim, Context and Functionality
This PR is spun out of the CI failures we were seeing with
cromwell
test workflows over in PR #305 just prior to the v1.3.0 release.TL;DR is that Pangolin, and in particular
scorpio
which runs as part ofpangolin
was being fed super-long paths fromcromwell
. The pythonmultiprocessing
package whichscorpio
uses struggles with these long paths, leading to failures that we had not seen before.So this PR serves 2 purposes:
TMPDIR=/tmp
which leads to shorter paths, and force the usage ofpangolin --tempdir /tmp
option so that temporary file paths are shorter.FOR THE FUTURE: we may want to switch the CI, specifically the CI that tests workflows via running
cromwell
, to run inside of a docker container instead of using a conda environment. This way we can stay in closer alignment with the version of Cromwell that is used on Terra on GCP. The published github releases lag behind the versions of cromwell used in Terra on GCP. So we could in theory run those workflows inside of this docker image:broadinstitute/cromwell:latest
https://hub.docker.com/r/broadinstitute/cromwell/tags as it will mirror the version currently being used in Terra on GCP.Not ideal to have a key dependency to the CI environment like this change unexpectedly, but if we want to reproduce/replicate cromwell behavior on Terra on GCP, it would be good for us to switch to using docker for running
cromwell
:hammer_and_wrench: Impacted Workflows/Tasks & Changes Being Made
Changes being made
actions/checkout@v3
toactions/checkout@v4
(node.js warning)dorny/paths-filter@v2
todorny/paths-filter@v3
(node.js warning)actions/setup-python@v4
toactions/setup-python@v5
(node.js warning)actions/upload-artifact@v4
to actions/upload-artifact@v5`pangolin
usage)log.err
is checked and no longer its contents (because it's now empty with new version of cromwell)Impacted workflows
This will affect the behavior of the workflow(s) even if users donβt change any workflow inputs relative to the last version: No
Running this workflow on different occasions could result in different results, e.g. due to use of a live database, "latest" docker image, or stochastic data processing: No
:clipboard: Workflow/Task Step Changes
π Data Processing
Docker/software or software versions changed: No
Databases or database versions changed: No
Data processing/commands changed: Yes - temorary directory used by
pangolin
is now hardcoded to/tmp
. Do not expect results to change, just the destination for reading/writing intermediate files used by pangolin.File processing changed: No
Compute resources changed: No
β‘οΈ Inputs
N/A
β¬ οΈ Outputs
phb_version
(output string of versioning task) now statesPHB v1.3.0-main
so that user on themain
branch know that they are usingmain
but the version of main downstream of v1.3.0 release. This change was also made on purpose to trigger TheiaProk CI workflows.:test_tube: Testing
Test Dataset
Will test with SARS-CoV-2 samples on various TheiaCov workflows.
Commandline Testing with MiniWDL or Cromwell (optional)
did not test locally, mainly want to see how these workflow changes perform in Terra and in the CI workflows
Terra Testing
Will update with tests soon.
Suggested Scenarios for Reviewer to Test
Test any TheiaCov workflows with sars-cov-2 samples as it will run the pangolin task.
Theiagen Version Release Testing (optional)
:microscope: Final Developer Checklist
π― Reviewer Checklist
ποΈ Associated Documentation (to be completed by Theiagen developer)