opentargets / issues

Issue tracker for Open Targets Platform and Open Targets Genetics Portal
https://platform.opentargets.org https://genetics.opentargets.org
Apache License 2.0
12 stars 2 forks source link

Redundant MeSH mappings in disease index #3591

Open ireneisdoomed opened 1 month ago

ireneisdoomed commented 1 month ago

Describe the bug There are 1156 terms in EFO that have a mapping to multiple MeSH terms. The majority of them (1123) are duplicates.

Observed behaviour See ticket opened to EFO for context https://github.com/EBISPOT/efo/issues/2308

Expected behaviour A clear and concise description of what you expected to happen.

To Reproduce Code provided by @Juanmaria-rr

faulty = (
    diseases.withColumn(
        "mesh1_flag",
        F.expr("filter(dbXRefs, x -> x rlike 'MeSH')")[
            0
        ],  # Extract the matching element
    )
    .withColumn(
        "MESH2_flag",
        F.expr("filter(dbXRefs, x -> x rlike 'MESH')")[
            0
        ],  # Extract the matching element
    )
    .withColumn("cleaned_mesh1", F.regexp_replace(F.col("mesh1_flag"), "(?i)MeSH:", ""))
    .withColumn("cleaned_mesh2", F.regexp_replace(F.col("MESH2_flag"), "(?i)MeSH:", ""))
    .withColumn(
        "equalOrNot",
        F.when(
            (F.col("cleaned_mesh1").isNotNull()) & (F.col("cleaned_mesh2").isNotNull()),
            F.when(
                F.col("cleaned_mesh1") == F.col("cleaned_mesh2"), F.lit("equal")
            ).otherwise(F.lit("diferent")),
        )
        .when(
            (F.col("cleaned_mesh1").isNotNull()) & (F.col("cleaned_mesh2").isNull()),
            F.lit("mesh1"),
        )
        .when(
            (F.col("cleaned_mesh1").isNull()) & (F.col("cleaned_mesh2").isNotNull()),
            F.lit("mesh2"),
        )
        .when(
            (F.col("cleaned_mesh1").isNull()) & (F.col("cleaned_mesh2").isNull()),
            F.lit("noData"),
        ),
    )
    .selectExpr("id", "cleaned_mesh1 as MeSH", "cleaned_mesh2 as MESH", "equalOrNot")
    .filter((F.col("cleaned_mesh1").isNotNull()) & (F.col("cleaned_mesh2").isNotNull()))
)

Additional context I think this is not new, because the FE handles these cases by displaying them together.