helxplatform / dug

Semantic Search
MIT License
32 stars 10 forks source link

redundant (disease) (disease) #238

Closed cbizon closed 2 years ago

cbizon commented 2 years ago

When a disease label ends in (disease), then the type added to the end looks kind of funny.

image

frostyfan109 commented 2 years ago

@cbizon

This is a problem with the dug data that the UI receives.

For some reason, a fair number of disease results have the structure

{
  "name": "<whatever disease> (disease)",
  "type": "disease",
  "description": "..."
}
Ex: "pulmonary embolism" data from API
{
    "id": "MONDO:0005279",
    "name": "pulmonary embolism (disease)",
    "description": "The obstruction of the pulmonary artery or one of its branches by an embolus, sometimes associated with infarction of the lung.",
    "type": "disease",
    "search_terms": [
        "pulmonary artery embolism",
        "pulmonary embolism",
        "PULMONARY EMBOLISM",
        "pulmonary embolus",
        "embolism, pulmonary",
        "PULMONARY EMBOLUS",
        "PULMONARY EMBOLUS:",
        "pulmonary embolism (disease)"
    ],
    "optional_terms": [
        "Phenotypic abnormality",
        "Abnormality of the respiratory system",
        "Pulmonary air embolism",
        "Abnormality of the vasculature",
        "Abnormal cardiovascular system physiology",
        "Abnormal vascular physiology",
        "Abnormality of pulmonary circulation",
        "structure with developmental contribution from neural crest",
        "Pulmonary fat embolism",
        "Abnormal respiratory system physiology",
        "pulmonary embolism (disease)"
    ],
    "concept_action": "",
    "identifiers": [
        {
            "id": "MONDO:0005279",
            "label": "pulmonary embolism (disease)",
            "equivalent_identifiers": [
                "MONDO:0005279",
                "DOID:9477",
                "UMLS:C0034065",
                "UMLS:C0524702",
                "MESH:D011655",
                "MEDDRA:10014521",
                "MEDDRA:10014537",
                "MEDDRA:10037377",
                "MEDDRA:10037380",
                "MEDDRA:10037436",
                "MEDDRA:10050071",
                "MEDDRA:10082134",
                "NCIT:C50713",
                "SNOMEDCT:233935004",
                "SNOMEDCT:59282003",
                "ICD10:I26",
                "HP:0002204"
            ],
            "type": [
                "biolink:Disease",
                "biolink:DiseaseOrPhenotypicFeature",
                "biolink:BiologicalEntity",
                "biolink:NamedThing",
                "biolink:Entity",
                "biolink:ThingWithTaxon"
            ],
            "synonyms": [
                "embolism, pulmonary",
                "pulmonary artery embolism",
                "pulmonary embolism",
                "pulmonary embolism (disease)",
                "pulmonary embolus"
            ]
        }
    ]
}

Since the UI formats the title of result cards as {result.name} ({result.type}), "(disease)" is repeated for results with these names.

I'm not sure why specifically disease concepts in dug are being assigned this sort of naming format, but I've never seen it happen with non-disease concepts.

cbizon commented 2 years ago

Yes, it's a function of how things are named in the ontologies. There is a lot of disagreement between e.g. diseases and phenotypes. A lot of things are represented as basically both. The ontologists usually say, no no, those are different things, one is the disease, and the other is the phenotype, meaning the collection of symptoms that define the disease. Then to make that clear to everybody ;) they put (disease) on the name of the disease.

So it's really crummy looking and the only question is whether we want to post-facto clean it up to make the UI look better.

frostyfan109 commented 2 years ago

@cbizon

Maybe we add a basic check such as if the name doesn't end with () add () in the title. Since you say this isn't an issue in the backend, and the (disease) is going to stay there for the foreseeable future, I think this would probably be fine to add since this is a bit annoying to deal with in the UI.