Closed ManavalanG closed 1 year ago
In GitLab by @tkmamidi on Feb 1, 2021, 11:06
Commented on annotation_parsing/README.md line 26
Looks like we need to load the anaconda module before running the script on cheaha or it doesn't work (at least for me :p).
module load Anaconda3/2020.02
In GitLab by @tkmamidi on Feb 1, 2021, 11:08
Commented on annotation_parsing/README.md line 28
Can we please have all the files in the same place instead of different directories? Sorry, I didn't think about this in the previous MR.
In GitLab by @tkmamidi on Feb 1, 2021, 15:19
Commented on annotation_parsing/parse_annotated_vars.py line 113
Looks like there is an error when running clinvar variants.
In GitLab by @wilkb777 on Feb 1, 2021, 15:54
Commented on annotation_parsing/README.md line 28
we have an issue #3 open for this exact thing already :grin: for now best to get things in and consolidate later once we have a clear picture of everything that will be in the project.
In GitLab by @tkmamidi on Feb 1, 2021, 15:55
Commented on annotation_parsing/README.md line 28
Gotcha!
In GitLab by @wilkb777 on Feb 1, 2021, 16:09
Commented on annotation_parsing/README.md line 26
This is a Cheaha specific issue. By default Cheaha's version of python that gets loaded when you start an interactive shell is Python 2.7.5
. When you load and init Anaconda3 its base is a Python 3.7.5
version, which is why you had this issue.
In GitLab by @wilkb777 on Feb 1, 2021, 16:24
Commented on annotation_parsing/parse_annotated_vars.py line 113
changed this line in version 2 of the diff
In GitLab by @wilkb777 on Feb 1, 2021, 16:24
added 1 commit
In GitLab by @wilkb777 on Feb 1, 2021, 16:24
Commented on annotation_parsing/parse_annotated_vars.py line 113
ok I've pushed up the fix, give it a go and let me know if it works out now.
In GitLab by @tkmamidi on Feb 1, 2021, 17:03
Commented on annotation_parsing/parse_annotated_vars.py line 113
It's working now!
In GitLab by @tkmamidi on Feb 1, 2021, 17:03
resolved all threads
In GitLab by @tkmamidi on Feb 1, 2021, 17:05
marked the checklist item README provided with the parser as completed
In GitLab by @tkmamidi on Feb 1, 2021, 17:05
marked the checklist item Review of the parser code as completed
In GitLab by @tkmamidi on Feb 1, 2021, 17:05
marked the checklist item Review of the test VEP annotated VCF and the corresponding output format as completed
In GitLab by @wilkb777 on Feb 1, 2021, 20:42
As a note @tkmamidi asked for some clarification on one of the output columns from the parsing:
Question: Alternate Allele
& VEP_Allele_Identifier
; how are these different?
Brandon Wilk 10:47 AM :smile:
VEP's output format for multi-allelic lines in the case of insertions, deletions, and indels is quite dumb IMO.
For each set of annotations VEP lists the allele the annotations are associated with but it does not always have the same format as the Alt allele listed in the VCF. So to be transparent (and also help check my work lol) I have the alt allele listed by VEP as a column to allow for back-tracking from the parsed TSV to the crap in the VEP annotated VCF
for example consider this variant annotated by VEP:
1 19631483 . CTT C 18.74 PASS FS=0;MQ=238.5;QD=9.37;SOR=1.609;FractionInformativeReads=0.5;DP=2;AF=1;AN=2;AC=2;CSQ=-|intron_variant|MODIFIER|AKR7A2|8574|Transcript|NM_001320979.1|protein_coding||5/5|NM_001320979.1:c.814-605_814-604del|||||||||-1||EntrezGene||rseq_mrna_match||TT|TT||||1.643|-0.076211||||||||||||||,-|intron_variant|MODIFIER|AKR7A2|8574|Transcript|NM_003689.3|protein_coding||6/6|NM_003689.3:c.919-605_919-604del|||||||||-1||EntrezGene||rseq_mrna_match||TT|TT||||1.643|-0.076211||||||||||||||
the key CSQ in the info column is the VEP annotated info separated by pipes
multiple transcripts worth of information is separated by commas
in this example there are two transcripts
-|intron_variant|MODIFIER|AKR7A2|8574|Transcript|NM_001320979.1|protein_coding||5/5|NM_001320979.1:c.814-605_814-604del|||||||||-1||EntrezGene||rseq_mrna_match||TT|TT||||1.643|-0.076211||||||||||||||
and
-|intron_variant|MODIFIER|AKR7A2|8574|Transcript|NM_003689.3|protein_coding||6/6|NM_003689.3:c.919-605_919-604del|||||||||-1||EntrezGene||rseq_mrna_match||TT|TT||||1.643|-0.076211||||||||||||||
the first column of each of those specifies the alt allele that the annotation info belongs to
as you can see it's a -
here because this variant is a deletion
no big deal since there's only one variant listed here, but still annoying
Tarun Mamidi 10:58 AM: Gotcha! Thanks for the explanation :slightly_smiling_face:
Brandon Wilk 10:59 AM: well, it gets worse :joy: when you get to lines like this:
1 19633106 rs72255348 AT ATT,ATTT,A,ATTTT 83.31 DRAGENHardQUAL FS=0;MQ=240.9;QD=3.23;SOR=2.303;FractionInformativeReads=0.667;DB;MQRankSum=-0.691;ReadPosRankSum=1.678;R2_5P_bias=0;DP=1317;AF=1,0.5,0.5,0.5;AN=384;AC=338,3,1,1;CSQ=TT|intron_variant|MODIFIER|AKR7A2|8574|Transcript|NM_001320979.1|protein_coding||4/5|NM_001320979.1:c.683+389dup|||||||||-1||EntrezGene||rseq_mrna_match||T|T||||1.026|-0.146843|-0.95|rs3835240|25636|27700|9.25487e-01|||||||||,TTT|intron_variant|MODIFIER|AKR7A2|8574|Transcript|NM_001320979.1|protein_coding||4/5|NM_001320979.1:c.683+389_683+390insAA|||||||||-1||EntrezGene||rseq_mrna_match||T|T||||0.999|-0.150858|-0.95|rs3835240|31|27700|1.11913e-03|||||||||,-|intron_variant|MODIFIER|AKR7A2|8574|Transcript|NM_001320979.1|protein_coding||4/5|NM_001320979.1:c.683+389del|||||||||-1||EntrezGene||rseq_mrna_match||T|T||||0.967|-0.155699|-0.95|||||||||||||,TTTT|intron_variant|MODIFIER|AKR7A2|8574|Transcript|NM_001320979.1|protein_coding||4/5|NM_001320979.1:c.683+389_683+390insAAA|||||||||-1||EntrezGene||rseq_mrna_match||T|T||||||-0.95|||||||||||||,TT|intron_variant|MODIFIER|AKR7A2|8574|Transcript|NM_003689.3|protein_coding||5/6|NM_003689.3:c.788+389dup|||||||||-1||EntrezGene||rseq_mrna_match||T|T||||1.026|-0.146843|-0.95|rs3835240|25636|27700|9.25487e-01|||||||||,TTT|intron_variant|MODIFIER|AKR7A2|8574|Transcript|NM_003689.3|protein_coding||5/6|NM_003689.3:c.788+389_788+390insAA|||||||||-1||EntrezGene||rseq_mrna_match||T|T||||0.999|-0.150858|-0.95|rs3835240|31|27700|1.11913e-03|||||||||,-|intron_variant|MODIFIER|AKR7A2|8574|Transcript|NM_003689.3|protein_coding||5/6|NM_003689.3:c.788+389del|||||||||-1||EntrezGene||rseq_mrna_match||T|T||||0.967|-0.155699|-0.95|||||||||||||,TTTT|intron_variant|MODIFIER|AKR7A2|8574|Transcript|NM_003689.3|protein_coding||5/6|NM_003689.3:c.788+389_788+390insAAA|||||||||-1||EntrezGene||rseq_mrna_match||T|T||||||-0.95|||||||||||||
which is just no fun :sob:
so I left it to reduce ambiguity
In GitLab by @tkmamidi on Feb 2, 2021, 13:06
approved this merge request
In GitLab by @tkmamidi on Feb 2, 2021, 13:06
marked this merge request as ready
In GitLab by @wilkb777 on Feb 2, 2021, 13:22
mentioned in commit dd06512484968654d59c31ec3f83a44f29b9c43f
In GitLab by @wilkb777 on Jan 31, 2021, 15:38
_Merges vep_outputparsing -> master
A simple, no frills, parser for taking VEP annotated VCFs and parsing them into a TSV format for easier downstream use. This includes review of the following: