Closed ElaineMcA closed 5 years ago
Is this an accurate summary of the current status of this issue?
We expect to see https://www.ncbi.nlm.nih.gov/clinvar/variation/183034/ and https://www.ncbi.nlm.nih.gov/clinvar/variation/31151/ at https://www.targetvalidation.org/evidence/ENSG00000147894/EFO_0000253 but it isn't present. The action is to discuss with the EVA curators to work out where its being dropped from the pipelines between ClinVar and OpenTargets.
Feedback from Cristina @ EVA received on where this term is being lost from pipeline: I have searched all the files from submissions during the past 2 years and haven't found those records in any of them.
The only good result returned by OLS (one of the tools we use for automated mapping) belong to OMIM, not to EFO/ORDO/HPO. See https://www.ebi.ac.uk/ols/search?q=Amyotrophic+lateral+sclerosis+and%2For+frontotemporal+dementia+1
That is most probably the stage at which the term was filtered out.
Does this answer explain the problem? I'm not sure. May require an in depth look
Looks like ClinVar uses OMIM terms, but in this case OMIM:105550 can be mapped to ORDO e.g. via OXO https://www.ebi.ac.uk/spot/oxo/terms/OMIM:105550 - but not by direct text matching.
It does also exist in Mondo (https://www.ebi.ac.uk/ols/ontologies/MONDO/terms?iri=http://purl.obolibrary.org/obo/MONDO_0007105) so maybe this is something to revisit once we're using EFO3 fully. It is probably good to revisit the mapping pipelines the data providers are running at that point too, to check they are consistent and up-to-date.
This issue is actually a combination of two separate problems, missing variants on the one hand and ontology mapping on the other:
RCV000192065
and RCV000024147
missing: These two variants appear in the file containing the ClinVar variants that we receive from EVA. However, they are annotated as NT expansion
so they are ignored by the snp2gene
pipeline and, therefore, they are not in the file we send back to EVA for evidence generation and cannot appear in the evidence file.
9 27573529 27573534 GGCCCC - + INS rs143561967 RCV000192065 203228 -1 NT expansion
9 27573529 27573534 GGCCCC - + INS rs143561967 RCV000024147 203228 -1 NT expansion
Orphanet_803
: This term is not part of EFO so any evidence strings annotated with it would be invalid. However I have not found any evidence strings that are annotated with this term.Closing this ticket as there is nothing to be done for now. If those two variants are considered by the pipeline in the future we will need to monitor the disease term they are annotated with.
@gkos-bio commented on Wed Feb 08 2017
none of the C9orf72 clinvar reports (https://www.ncbi.nlm.nih.gov/clinvar/?term=C9orf72%5Bgene%5D) show up in our data
On our platform, https://www.targetvalidation.org/disease/Orphanet_803 , gives an empty page, but we actually have evidence data for Orphanet_803 (in the 16.12 & 17.02 release)
On zooma, ALS was give a EFO accession EFO_0000253 while there is an orphanet id for it.
in our json string, we have evidence associated to orphanet_803. But it is not appearing on our platform, evidence mapping to EFO_0000253 are on the platform. But looking at the below, should orphanet_803, EFO_0000253 be replaced by their parent term?
http://www.ebi.ac.uk/ols/ontologies/efo/terms/graph?iri=http://www.ebi.ac.uk/efo/EFO_0001357 http://www.ebi.ac.uk/ols/ontologies/efo/terms/graph?iri=http://www.ebi.ac.uk/efo/EFO_0001357
x-ref to the ORDO IRI in the future (just for bioinformatics convenience) and ask EVA to map using the 2 terms in EFO.
@gkos-bio commented on Mon Feb 13 2017
most of the variants in the C9orf72 gene are not pathogenic so are not integrated in our system. Only two of the short mutations in Clinvar (rather than large deletions) are annotated as pathogenic https://www.ncbi.nlm.nih.gov/clinvar/variation/183034/ https://www.ncbi.nlm.nih.gov/clinvar/variation/31151/ We should be receiving these two.
@ckongEbi commented on Tue Feb 14 2017
@gkos-bio added/moved data-related issues to "data-providers-docs" repo https://github.com/opentargets/data-providers-docs/issues/10
@ckongEbi commented on Tue Jul 10 2018
@gkos-bio have we highlight this to SPOT or need following up?
@ElaineMcA commented on Tue Sep 11 2018
Need to follow-up with EVA.