Closed d0choa closed 3 years ago
A shortlist of potentially interesting ones: MONDO, MeSH, NCIt, MEDDRA, UMLS, Orphanet
The GraphQL API now provides various database IDs for disease/phenotypes and we can use these IDs to construct new cross-reference links. This will not only benefit users who may be familiar with other identifiers, but it will also benefit our search engine optimisation and domain and page authority rankings.
For example, the GraphQL API returns the following data based on a query for rheumatoid arthritis:
{
"data": {
"disease": {
"dbXRefs": [
"SNOMEDCT:69896004",
"NCIT:C2884",
"ICD10:M05",
"KEGG:05323",
"ICD10:M06.9",
"MSH:D001172",
"MONDO:0008383",
"COHD:80809",
"ICD9:714.0",
"NCIt:C2884",
"UMLS:C0003873",
"ICD10:M06",
"OMIM:180300",
"SCTID:69896004",
"OMIM:604302",
"MESH:D001172",
"DOID:7148"
]
}
}
}
As mentioned by @d0choa, we will display cross-reference links to MONDO, MeSH, NCIt, MEDDRA, UMLS, Orphanet, ICD10, and OMIM.
The layout will be the same as the drug profile page cross-references noted in #1356, with the name of the database followed by the ID that acts as a link to the database. The name for each database is before the colon :
and the ID for the database is after the colon :
.
Using the table below, please implement cross-reference links on the disease/phenotype profile page.
Note: for the purposes of the spec, the ID is noted as xRefId
.
Source (from API response) | URL structure | Example |
---|---|---|
MONDO | http://purl.obolibrary.org/obo/MONDO_ + xRefId |
MONDO: 0008383 |
MeSH | https://identifiers.org/mesh: + xRefId |
MeSH: D001172 |
NCIt | https://identifiers.org/ncit: + xRefId |
NCIt: C2884 |
MedDRA | https://identifiers.org/meddra: + xRefId |
MedDRA: 10002026 |
UMLS | https://identifiers.org/umls: + xRefId |
UMLS: C0021390 |
Orphanet | https://identifiers.org/orphanet: + xRefId |
Orphanet: 85163 |
ICD10 | https://identifiers.org/icd: + xRefId |
ICD10: I42.1 |
Please note that diseases will not have all cross-references (e.g. rheumatoid arthritis does not have an Orphanet entry)
Most recurrent (in different diseases) normalised resources accounting for:
>>> disease.select("id", F.explode("dbXRefs").alias("dbXRefs")).withColumn("dbXRefs", F.lower(F.split("dbXRefs", ":").getItem(0))).distinct().groupBy("dbXRefs").count().sort(F.col("count").desc()).show(50)
+-----------+-----+
| dbXRefs|count|
+-----------+-----+
| umls|10282|
| mondo| 8145|
| icd10| 7429|
| sctid| 6463|
| mesh| 5997|
| ncit| 5486|
| doid| 5435|
| omim| 5225|
| gard| 3817|
| icd9| 2841|
| meddra| 2611|
| orphanet| 1606|
| efo| 1482|
| pmid| 1181|
| snomedct| 1059|
| cohd| 985|
|snomedct_us| 724|
| wikipedia| 702|
| fma| 514|
| emapa| 502|
| icdo| 482|
| omimps| 474|
| msh| 473|
| zfa| 467|
| ma| 434|
| hp| 427|
| oncotree| 425|
| bto| 395|
| tao| 339|
| vhog| 309|
| gaid| 280|
| caloha| 276|
| ehdaa2| 239|
| aao| 230|
| opencyc| 225|
| xao| 216|
| mat| 205|
| ev| 197|
| fbbt| 181|
| miaa| 171|
| ehdaa| 170|
| galen| 168|
| dc| 138|
| http| 118|
| https| 87|
| birnlex| 82|
| bams| 77|
| dhba| 67|
| ordo| 54|
| hba| 45|
+-----------+-----+
only showing top 50 rows
As noted by @d0choa, there is data duplication caused by differences in spelling and capitalisation. For example, the following response contains two NCIt entries that have the same ID — one "NCIT", the other "NCIt".
{
"data": {
"disease": {
"dbXRefs": [
"SNOMEDCT:69896004",
"NCIT:C2884",
"ICD10:M05",
"KEGG:05323",
"ICD10:M06.9",
"MSH:D001172",
"MONDO:0008383",
"COHD:80809",
"ICD9:714.0",
"NCIt:C2884",
"UMLS:C0003873",
"ICD10:M06",
"OMIM:180300",
"SCTID:69896004",
"OMIM:604302",
"MESH:D001172",
"DOID:7148"
]
}
}
}
Before constructing the cross-reference links, can we please take the source value — the content before the colon :
— and normalise by transforming to lowercase? Then, we can take the first instance where the normalised source string is one of "mondo", "mesh", "ncit", "meddra", "umls", "orphanet", "icd10", or "omim", and use the ID in the web interface and to construct the relevant link.
Source | Normalised source string | URL structure | Example |
---|---|---|---|
MONDO | mondo | http://purl.obolibrary.org/obo/MONDO_ + xRefId |
MONDO: 0008383 |
MeSH | mesh | https://identifiers.org/mesh: + xRefId |
MeSH: D001172 |
NCIt | ncit | https://identifiers.org/ncit: + xRefId |
NCIt: C2884 |
MedDRA | meddra | https://identifiers.org/meddra: + xRefId |
MedDRA: 10002026 |
UMLS | umls | https://identifiers.org/umls: + xRefId |
UMLS: C0021390 |
Orphanet | orphanet | https://identifiers.org/orphanet: + xRefId |
Orphanet: 85163 |
ICD10 | icd10 | https://identifiers.org/icd: + xRefId |
ICD10: I42.1 |
OMIM | omim | https://www.omim.org/entry/ + xRefId |
OMIM: 180300 |
Following on the great work on drug references, we have also included disease references in the API (example query)
All the possible prefixes that will come out from the API are the next:
Examples with many different prefixes on them:
Happy to help to prioritise prefixes.