Closed susannasiebert closed 11 months ago
I performed a full test of immuno.wdl with this PR applied. It provided me with this example of the impact of this fix.
A variant was called in the gene: LILRB2
and the transcript selected in pVACview (aggregate report) was: ENST00000391749.4 (ENSG00000131042).
The kallisto gene_abundance.tsv shows:
gene_name gene abundance counts length
LILRB2 ENSG00000131042 1.22981410950827 37.4431712278154 1519.92061351928
LILRB2 ENSG00000274513 0.523648071417863 3.67733451548483 350.57535471207
LILRB2 ENSG00000275463 0.229592362639494 4.14209850196773 900.638976403266
LILRB2 ENSG00000276146 2.85862722432629 94.9987124328477 1659.00595957922
LILRB2 ENSG00000277751 0.170873793229455 1.42026247930074 414.935799292377
Before the VAtools fix, the gene expression value annotated into the VCF was 5.013
(the sum of all genes with the ambiguous name "LILRB2", instead of just the one correct one). After the fix, the gene expression value was 1.230
(the correct gene and transcript).
Additional examples followed the same pattern and we should now also see gene expression values for genes that did not have an ID at all.
Also reviewing the parsing of ADF and ADR (for DNA) and RADF and RADR (for RNA) annotations for forward and reverse strand depth.
VCF record before and after applying this PR:
chr22 49885855 . G A GT:AD:AF:DP:F1R2:F2R1:FAD:SB:MQ0:MQ0FRAC:RDP:RAF:RAD:GX:TX 0/0:103,0:0.0:103:41,0:40,0:83,0:58,45,0,0:.:.:.:.:.:.:.
chr22 49885855 . G A GT:AD:AF:DP:F1R2:F2R1:FAD:SB:ADF:ADR:MQ0:MQ0FRAC:RDP:RAF:RAD:RADF:RADR:GX:TX 0/0:103,0:0.0:103:41,0:40,0:83,0:58,45,0,0:57,0:46,0:.:.:.:.:.:.:.:.:.
The new annotations do indeed appear to be present.
This PR appears to be behaving as expected.
This version introduces a few bugfixes and changes:
The Docker container is now based on the python 3.11 base image (instead of ubuntu) so that might introduce some subtle differences although I don't believe any changes are necessary in how we call the tools.
Due to these somewhat extensive changes, I would suggest to do a test immuno.wdl run before merging.