xiamaz / PEDIA-workflow

This is the global workflow analysing data. 1. Quality check; 2. Phenomization; 3. Simulation; 4. classification
0 stars 1 forks source link

HGVS Transcript validation #5

Open xiamaz opened 6 years ago

xiamaz commented 6 years ago

Currently hgvs objects are generated only on the basis of the parseability of the hgvs string. No variant validation is being done.

Expected result: All hgvs variants should be consistent to a reference. This needs to be ensured for all variants present in the case object.

Current result: Variants in the case object are not validated.

Proposed fix: Transcript validation can be done via Mutalyzer or UTA (present in the biocommons/hgvs library). We will need to implement the necessary API bindings.

HGVS strings failing validation should be added to the errorfixer for manual resolution.

Since additional external API calls can reduce the reliability of the entire pipeline, some form of storage of externally validated hgvs strings should be implemented. This could also be done via the errorfixer. The generated dictionary of genomic_entry_id to hgvs strings can thereafter be used to quickly translate the raw data into correct hgvs variants.

xiamaz commented 6 years ago

b6dbaa446449b11db756fbf0e487b66bea39e582 and 76999e2d85681b16667a8095cb2818ec94545f43 have established Mutalyzer based reference transcript validation. Incomplete or wrong reference transcripts are automatically corrected using mutalyzer provided information against GRCh37