wustl-oncology / analysis-wdls

Scalable genomic analysis pipelines, written in WDL
MIT License
5 stars 11 forks source link

Upgrade to version 5.1.0 #135

Closed susannasiebert closed 11 months ago

susannasiebert commented 1 year ago

This version introduces a few bugfixes and changes:

The Docker container is now based on the python 3.11 base image (instead of ubuntu) so that might introduce some subtle differences although I don't believe any changes are necessary in how we call the tools.

Due to these somewhat extensive changes, I would suggest to do a test immuno.wdl run before merging.

malachig commented 11 months ago

I performed a full test of immuno.wdl with this PR applied. It provided me with this example of the impact of this fix.

A variant was called in the gene: LILRB2 and the transcript selected in pVACview (aggregate report) was: ENST00000391749.4 (ENSG00000131042).

The kallisto gene_abundance.tsv shows:

gene_name   gene    abundance   counts  length
LILRB2  ENSG00000131042 1.22981410950827    37.4431712278154    1519.92061351928
LILRB2  ENSG00000274513 0.523648071417863   3.67733451548483    350.57535471207
LILRB2  ENSG00000275463 0.229592362639494   4.14209850196773    900.638976403266
LILRB2  ENSG00000276146 2.85862722432629    94.9987124328477    1659.00595957922
LILRB2  ENSG00000277751 0.170873793229455   1.42026247930074    414.935799292377

Before the VAtools fix, the gene expression value annotated into the VCF was 5.013 (the sum of all genes with the ambiguous name "LILRB2", instead of just the one correct one). After the fix, the gene expression value was 1.230 (the correct gene and transcript).

malachig commented 11 months ago

Additional examples followed the same pattern and we should now also see gene expression values for genes that did not have an ID at all.

malachig commented 11 months ago

Also reviewing the parsing of ADF and ADR (for DNA) and RADF and RADR (for RNA) annotations for forward and reverse strand depth.

VCF record before and after applying this PR:

chr22   49885855    .   G   A   GT:AD:AF:DP:F1R2:F2R1:FAD:SB:MQ0:MQ0FRAC:RDP:RAF:RAD:GX:TX  0/0:103,0:0.0:103:41,0:40,0:83,0:58,45,0,0:.:.:.:.:.:.:.
chr22   49885855    .   G   A   GT:AD:AF:DP:F1R2:F2R1:FAD:SB:ADF:ADR:MQ0:MQ0FRAC:RDP:RAF:RAD:RADF:RADR:GX:TX    0/0:103,0:0.0:103:41,0:40,0:83,0:58,45,0,0:57,0:46,0:.:.:.:.:.:.:.:.:.

The new annotations do indeed appear to be present.

malachig commented 11 months ago

This PR appears to be behaving as expected.