Closed AsierGonzalez closed 4 years ago
TEPs checked on May 20th and there are no updates
EuropePMC evidence file received on May 23. It was checked and it looks good:
MONDO_0100096
). However, the Hoffmann et al. Cell paper is not among them, checking with Shyama why this might be.PheWAS catalog evidence updated on 26th May:
Abnormal chest sounds
- see updatesphewas_string
replaced by phewas_string_and_code
, e.g. "Colorectal cancer" vs "Colorectal cancer [153]"variant_id
)ChEMBL file received on 21st May:
evidence.target2drug.provenance_type.database.version
) changed from "26"
to "27"
as it was generated based on ChEMBL 27. New file called cttv008-27-05-2020.json.gz
and uploaded to the ChEMBL Google bucket.MONDO_0100096
)Baseline and differential expression files received on 12th May.
Chemical Probes updated on 1st June:
NVS-MALT1
) in SGC, there are 77 in total.ABBV-744
, AT1
, BAY-885
, CM11
, Eleutherobin
, Glyburide
, ICI-199441
, NI-57
, PFI-3
, RO2468
, T-26c
, UCSF7447
. There are 135 probes in total.
Eleutherobin
is a particular case as it targets tubulin, which is an structural protein formed by the alpha- and beta-tubulins. Given that protein complexes are not considered in the platform yet, it has been decided to assign this chemical probe to all the alpha- and beta-tubulins in Open Targets, which is a total of 17. A better solution for this will be thought of in the future.The first 20.06 pipeline run has failed due to some ChEMBL evidence strings having the evidence.drug2clinic.date_asserted
in the yyyy-dd-mm
format instead of yyyy-mm-dd
. ChEMBL have generated a new evidence file with the correct format and OT have changed the JSON schema to capture this issue (see #1090 and PR #87). The new ChEMBL evidence file looks good:
1.6.8
)2016-25-11T00:00:00.000Z
is now 2016-11-25T00:00:00.000Z
EVA file received on 22nd May:
Alhzeimer disease (AD)
and they are mapped to EFO_0000249
. These were manually mapped to Early-onset autosomal dominant Alzheimer disease
(Orphanet_1020) - see #104Python script to generate metrics revamped and bash script included so that now all the information needed to fill the release metrics spreadsheets is created in one go and it requires very little editing. See PR #2
Observations looking into invalid evidence strings in first 20.06 run ():
MONDO_002144
instead of MONDO_0021440
for benign neoplasm of skin. @smnorthen is helping with the assessment of the issue.ENSG00000285395
invalid because it's on a non-standard chromosome. ENSG00000103489 could be used instead for target XYLT1
.P43627
, P13762
, P79483
) that are not included in OT. They seem to be valid targets (KIR2DL2
, HLA-DRB4
, HLA-DRB3
) that are only annotated in non-standard chromosomes. so there is nothing that can be done./MONDO_0002254
instead of http://purl.obolibrary.org/obo/MONDO_0002254
and /MONDO_0024317
instead of http://purl.obolibrary.org/obo/MONDO_0024317
), 12 HP terms not in EFO ( "Hypogonadism" - HP_0000135, "Growth hormone deficiency" - HP_0000824, "Hypoparathyroidism" - HP_0000829, "Adrenal insufficiency" - HP_0000846, "Hyperuricemia" - HP_0002149, "Gastric ulcer" - HP_0002592, "Renal angiomyolipoma" - HP_0006772, "Ocular hypertension" - HP_0007906, "Primary adrenal insufficiency" - HP_0008207, "Heparin-induced thrombocytopenia" - HP_0011874, "Myocarditis" - HP_0012819, "Macular edema" - HP_0040049), 1 HP term that has been replaced (HP_0001587 by "Premature ovarian insufficiency" - HP_0008209 which is in EFO) and 4 EFO terms not part of the slim-EFO used by OT ("digestive system disease" - EFO_0000405, "mental health" - EFO_0003935, "chronic disease" - EFO_0009714, "inflammatory disease" - EFO_0009903) and one obsolete EFO term ("influenza infection" - EFO_0001669). Some of those HP terms have equivalent terms in EFO and those could be used instead (e.g. MONDO_0002146 for "hypogonadism", EFO_0009451 for "hypoparathyroidism", EFO_0009104 for "hyperuricemia", EFO_0009454 for "gastric ulcer", EFO_1001069 for "ocular hypertension", MONDO_0015128 for "primary adrenal insufficiency", EFO_0009609 for "myocarditis"), some others have matching synonyms ("adrenal insufficiency" is a synonym of "adrenocortical insufficiency" - EFO_0009491, a synonym of the HP term "Renal angiomyolipoma" is "Kidney Angiomyolipoma" which exists in EFO as [EFO_1000312](http://www.ebi.ac.uk/efo/EFO_1000312, "macular edema" is a synonym of "macular retinal edema" - MONDO_0003005) whereas the others would need to be imported.
I will discuss these issues with ChEMBL.snp multiple
(see #809) A6NLF2
, P04908
, P0C0S8
, P62805
, P62807
, P68431
, Q6FI13
, Q71DI3
, Q8NG57
) mapping to multiple genes. It should be solved by running Miguel's UniProt fix as it is already done with UniProt, ChEMBL and EuropePMC evidence.ENSG00000130489
,ENSG00000150526
).http://identifiers.org/ensembl/
).
These will be reviewed if the data is updated.EFO_0000000
. This will be reviewed if the data is updated.ENSG00000274847
, ENSG00000277796
), whereas the other one is a deprecated id (ENSG00000285258
).ENSG00000277796
, see above) and an obsolete EFP term (EFO_0003775
, see above)
These will be reviewd the next time PhenoDigm is updated.Great job @AsierGonzalez. All the follow-ups make sense to me
A number of checks have to be performed on the evidence files submitted by data providers as they arrive and before the pipeline is run to see whether the agreed changes have been implemented.