theiagen / public_health_bioinformatics

Bioinformatics workflows for genomic characterization, submission preparation, and genomic epidemiology of pathogens of public health concern.
GNU General Public License v3.0
37 stars 17 forks source link

update default pangolin docker image to image with pangolin-data v1.26 #394

Closed kapsakcj closed 6 months ago

kapsakcj commented 6 months ago

This PR closes #390

🗑️ This dev branch should be deleted after merging to main.

:brain: Aim, Context and Functionality

Update default docker image used for running pangolin across all TheiaCov workflows & Pangolin_update standalone workflow

:hammer_and_wrench: Impacted Workflows/Tasks & Changes Being Made

Updated the default docker image in organism_parameters subwf. tested successfully with miniwdl

This will affect the behavior of the workflow(s) even if users don’t change any workflow inputs relative to the last version : Yes

Running this workflow on different occasions could result in different results, e.g. due to use of a live database, "latest" docker image, or stochastic data processing : No

:clipboard: Workflow/Task Step Changes

🔄 Data Processing

Nothing has changed other than the docker image

Docker/software or software versions changed: us-docker.pkg.dev/general-theiagen/staphb/pangolin:4.3.1-pdata-1.25.1 ➡️ us-docker.pkg.dev/general-theiagen/staphb/pangolin:4.3.1-pdata-1.26. This is a copy of StaPH-B's docker image for pangolin.

Databases or database versions changed: pangolin-data (database) upgraded to v1.26

Data processing/commands changed: No

File processing changed: No

Compute resources changed: No

➡️ Inputs

N/A

⬅️ Outputs

N/A

:test_tube: Testing

Test Dataset

Tested on random sars-cov-2 pulled from gisaid

Commandline Testing with MiniWDL or Cromwell (optional)

theiacov_fasta local test:

$ miniwdl run ~/github/public_health_bioinformatics/workflows/theiacov/wf_theiacov_fasta.wdl samplename=EPI_ISL_18606234 assembly_fasta= EPI_ISL_18606234.fasta seq_method="unknown" input_assembly_method="unknown"

[most output redacted for brevity]

2024-03-26 15:45:12.570 wdl.w:theiacov_fasta done
{
  "dir": "/home/curtis_kapsak/tests-galore/testdata/sarscov2/20240326_154444_theiacov_fasta",
  "outputs": {
    "theiacov_fasta.abricate_flu_database": null,
    "theiacov_fasta.abricate_flu_results": null,
    "theiacov_fasta.abricate_flu_subtype": null,
    "theiacov_fasta.abricate_flu_type": null,
    "theiacov_fasta.abricate_flu_version": null,
    "theiacov_fasta.assembly_length_unambiguous": 29737,
    "theiacov_fasta.assembly_method": "unknown",
    "theiacov_fasta.auspice_json": "/home/curtis_kapsak/tests-galore/testdata/sarscov2/20240326_154444_theiacov_fasta/out/auspice_json/EPI_ISL_18606234.nextclade.auspice.json",
    "theiacov_fasta.nextclade_aa_dels": "S:N211-",
    "theiacov_fasta.nextclade_aa_subs": "E:T9I,M:D3H,M:Q19E,M:A63T,M:A104V,N:P13L,N:R203K,N:G204R,N:Q229K,N:S413R,ORF1a:S135R,ORF1a:A211D,ORF1a:T842I,ORF1a:V1056L,ORF1a:G1307S,ORF1a:T1542I,ORF1a:N2526S,ORF1a:A2710T,ORF1a:L3027F,ORF1a:T3090I,ORF1a:T3255I,ORF1a:P3395H,ORF1a:V3593F,ORF1a:T4175I,ORF1b:P314L,ORF1b:R1315C,ORF1b:I1566V,ORF1b:T2163I,ORF3a:T223I,ORF6:D61L,ORF9b:P10S,S:T19I,S:R21T,S:S50L,S:V127F,S:G142D,S:F157S,S:R158G,S:I197V,S:L212I,S:V213G,S:L216F,S:H245N,S:A264D,S:I332V,S:G339H,S:K356T,S:S371F,S:S373P,S:S375F,S:T376A,S:D405N,S:R408S,S:K417N,S:N440K,S:V445H,S:G446S,S:N450D,S:L452W,S:N460K,S:S477N,S:T478K,S:N481K,S:E484K,S:F486P,S:Q498R,S:N501Y,S:Y505H,S:E554K,S:A570V,S:D614G,S:P621S,S:H655Y,S:N679K,S:P681R,S:N764K,S:D796Y,S:S939F,S:Q954H,S:N969K,S:P1143L",
    "theiacov_fasta.nextclade_clade": "23I (Omicron)",
    "theiacov_fasta.nextclade_docker": "us-docker.pkg.dev/general-theiagen/nextstrain/nextclade:2.14.0",
    "theiacov_fasta.nextclade_ds_tag": "2023-12-03T12:00:00Z",
    "theiacov_fasta.nextclade_json": "/home/curtis_kapsak/tests-galore/testdata/sarscov2/20240326_154444_theiacov_fasta/out/nextclade_json/EPI_ISL_18606234.nextclade.json",
    "theiacov_fasta.nextclade_lineage": "BA.2.86.5",
    "theiacov_fasta.nextclade_qc": "good",
    "theiacov_fasta.nextclade_tsv": "/home/curtis_kapsak/tests-galore/testdata/sarscov2/20240326_154444_theiacov_fasta/out/nextclade_tsv/EPI_ISL_18606234.nextclade.tsv",
    "theiacov_fasta.nextclade_version": "nextclade 2.14.0",
    "theiacov_fasta.number_Degenerate": 3,
    "theiacov_fasta.number_N": 66,
    "theiacov_fasta.number_Total": 29806,
    "theiacov_fasta.pango_lineage": "BA.2.86.5",
    "theiacov_fasta.pango_lineage_expanded": "B.1.1.529.2.86.5",
    "theiacov_fasta.pango_lineage_report": "/home/curtis_kapsak/tests-galore/testdata/sarscov2/20240326_154444_theiacov_fasta/out/pango_lineage_report/EPI_ISL_18606234.pangolin_report.csv",
    "theiacov_fasta.pangolin_assignment_version": "pangolin 4.3.1; PUSHER-v1.26",
    "theiacov_fasta.pangolin_conflicts": "0.0",
    "theiacov_fasta.pangolin_docker": "us-docker.pkg.dev/general-theiagen/staphb/pangolin:4.3.1-pdata-1.26",
    "theiacov_fasta.pangolin_notes": "Usher placements: BA.2.86.5(1/1)",
    "theiacov_fasta.pangolin_versions": "pangolin: 4.3.1;pangolin-data: 1.26;constellations: v0.1.12;scorpio: 0.3.19;pangolin-assignment: 1.26;usher 0.6.3",
    "theiacov_fasta.percent_reference_coverage": 99.44,
    "theiacov_fasta.qc_check": null,
    "theiacov_fasta.qc_standard": null,
    "theiacov_fasta.seq_platform": "unknown",
    "theiacov_fasta.theiacov_fasta_analysis_date": "2024-02-23",
    "theiacov_fasta.theiacov_fasta_version": "PHB v1.3.0-main",
    "theiacov_fasta.vadr_alerts_list": "/home/curtis_kapsak/tests-galore/testdata/sarscov2/20240326_154444_theiacov_fasta/out/vadr_alerts_list/EPI_ISL_18606234.vadr.alt.list",
    "theiacov_fasta.vadr_docker": "us-docker.pkg.dev/general-theiagen/staphb/vadr:1.5.1",
    "theiacov_fasta.vadr_fastas_zip_archive": "/home/curtis_kapsak/tests-galore/testdata/sarscov2/20240326_154444_theiacov_fasta/out/vadr_fastas_zip_archive/EPI_ISL_18606234_vadr-fasta-files.zip",
    "theiacov_fasta.vadr_num_alerts": "0"
  }
}

Terra Testing

Tested with 33 sars-cov-2 genomes from GISAID: ✅ https://app.terra.bio/#workspaces/theiagen-validations/curtis-sandbox-theiagen-validations/job_history/027b31a6-2fb6-4e13-8b40-3dcfc53a0a87

Suggested Scenarios for Reviewer to Test

Test with any sars-cov-2 sample (assembly, reads, anything will be fine)

Theiagen Version Release Testing (optional)

:microscope: Final Developer Checklist

🎯 Reviewer Checklist

🗂️ Associated Documentation (to be completed by Theiagen developer)

kapsakcj commented 6 months ago

FYI This PR has a lot of the same CI-related changes as PR #375

I tried my best to make them identically - but there may be some merge conflicts on either this PR or PR 375 depending on which one gets merged first. I'm happy to help resolve them if they occur

kevinlibuit commented 6 months ago

Nice! Straight forward and clean. Will merge pending successful functional check.