opentargets / issues

Issue tracker for Open Targets Platform and Open Targets Genetics Portal
https://platform.opentargets.org https://genetics.opentargets.org
Apache License 2.0
12 stars 2 forks source link

eQTL database data comes with the wrong feature #2002

Closed mkarmona closed 2 years ago

mkarmona commented 2 years ago

In the previous release, the data ingested from eQTL database for the GTEX came this way

│ eqtl     │ eqtl          │ GTEX_v7-UBERON_0000992                 │                                                                                                  
│ eqtl     │ eqtl          │ GTEX_v7-UBERON_0000995                 │              
│ eqtl     │ eqtl          │ GTEX_v7-UBERON_0000996                 │                                                                                                  
│ eqtl     │ eqtl          │ GTEX_v7-UBERON_0001134                 │              
│ eqtl     │ eqtl          │ GTEX_v7-UBERON_0001157                 │              
│ eqtl     │ eqtl          │ GTEX_v7-UBERON_0001159                 │              
│ eqtl     │ eqtl          │ GTEX_v7-UBERON_0001264                 │              
│ eqtl     │ eqtl          │ GTEX_v7-UBERON_0001323                 │              
│ eqtl     │ eqtl          │ GTEX_v7-UBERON_0001621                 │              
│ eqtl     │ eqtl          │ GTEX_v7-UBERON_0001830                 │              
│ eqtl     │ eqtl          │ GTEX_v7-UBERON_0001870                 │              
│ eqtl     │ eqtl          │ GTEX_v7-UBERON_0001873                 │ 

In spite of fixing a minor hack introduced in previous releases (keeping the study and the feature code together) we removed the study and left just the proper feature but it does not come with the proper code but with a encoded label

│ eqtl     │ eqtl          │ OVARY                           │
│ eqtl     │ eqtl          │ PANCREAS                        │
│ eqtl     │ eqtl          │ PANCREATIC_ISLET                │
│ eqtl     │ eqtl          │ PITUITARY                       │
│ eqtl     │ eqtl          │ PLACENTA_NAIVE                  │
│ eqtl     │ eqtl          │ PLATELET                        │
│ eqtl     │ eqtl          │ PROSTATE                        │
│ eqtl     │ eqtl          │ PUTAMEN                         │
│ eqtl     │ eqtl          │ RECTUM                          │
│ eqtl     │ eqtl          │ SENSORY_NEURON                  │
│ eqtl     │ eqtl          │ SKIN                            │
│ eqtl     │ eqtl          │ SKIN_NOT_SUN_EXPOSED            │
│ eqtl     │ eqtl          │ SKIN_SUN_EXPOSED                │
│ eqtl     │ eqtl          │ SMALL_INTESTINE                 │
│ eqtl     │ eqtl          │ SPLEEN                          │
│ eqtl     │ eqtl          │ STOMACH                         │

and this is not what we need but rather the corresponding curated code ID (UBERON or whatever better describes the label).

Jeremy37 commented 2 years ago

I think we've fixed this? I added code to create the biofeature mappings file here: https://github.com/opentargets/genetics-v2g-data/tree/master/mapping