Missing Genetics Portal evidence on Platform evidence page

DSuveges commented 2 years ago

Let’s find associations for CWC22. If we narrow down the list by searching for Muscular dystrophy, we’ll see there is an association supported by Genetic associations. However, if we click on the field to see the evidence the evidence page is completely empty. The supporting evidence is not loaded.

This bug is a direct consequence of the data loading/overflow issue reported here: #1687 and caused by high odds ratio values. This is how the evidence looks like:

target	disease	OR	OR_low	OR_up	beta	beta_low	beta_up	pv_exp	pv_mantissa	min_or
ENSG00000163510	Orphanet_98473	3.87053e+33	1.71153e+28	8.75299e+38	nan	nan	nan	-35	9.639	8.75299e+38
ENSG00000144331	Orphanet_98473	3.87053e+33	1.71153e+28	8.75299e+38	nan	nan	nan	-35	9.639	8.75299e+38

Important to note that these odds ratio values are not too high. Well within the precision of Python, and is represented properly in the schema:

 |-- oddsRatio: double (nullable = true)
 |-- oddsRatioConfidenceIntervalLower: double (nullable = true)
 |-- oddsRatioConfidenceIntervalUpper: double (nullable = true)

So this issue will not be picked up by the evidence validation or the ETL. Also there was no problem in calculating the evidence score. It seems somehow this order of magnitude is close to the limit of integer representation of elastic search, which is 2^31. This suggests that there might be an issue how elastic is configured. But this bit needs further clarification. But the data seems to be right.

DSuveges commented 2 years ago

@cmalangone Could you please follow up on this issue? The data seems right, passes the ETL right, but it seems the evidence got lost in the elastic.

cmalangone commented 2 years ago

@DSuveges keep you posted

mbdebian commented 2 years ago

@DSuveges , it looks like this is still an open issue. I was wondering which final approach was chosen: either to tackle it from the data point of view, as opentargets/issues#1687 suggests, or from the backend / frontend (software) point of view.

DSuveges commented 2 years ago

@mbdebian Yes, the issue is still there. We haven't made any step to address it at the data level. I think it is reasonable assume that a number in the order of 1e-/+50 is representable. However it was just my assumption that the problem is the over/underflow at graphql. It requires further investigation to validate this hypothesis.

mbdebian commented 2 years ago

Is there any update on this? It looks like this issue is lingering in our backlog, and we may have to just close it.

mbdebian commented 2 years ago

@DSuveges , would you know whether there's any update on this? May I close this issue?

JarrodBaker commented 1 year ago

The issue we have is that we aren't specifying an index schema for data ingestion, so ES makes a best guess as to the shape of the data. As almost all of the entries fit within the range of a float (32-bit) it is creating a field with that value.

We can inspect the index settings with the query <es>/evidence_datasource_ot_genetics_portal/_mapping/field/odds* which shows:

{
  "evidence_datasource_ot_genetics_portal" : {
    "mappings" : {
      "oddsRatio" : {
        "full_name" : "oddsRatio",
        "mapping" : {
          "oddsRatio" : {
            "type" : "float"
          }
        }
      },
      "oddsRatioConfidenceIntervalUpper" : {
        "full_name" : "oddsRatioConfidenceIntervalUpper",
        "mapping" : {
          "oddsRatioConfidenceIntervalUpper" : {
            "type" : "float"
          }
        }
      },
      "oddsRatioConfidenceIntervalLower" : {
        "full_name" : "oddsRatioConfidenceIntervalLower",
        "mapping" : {
          "oddsRatioConfidenceIntervalLower" : {
            "type" : "float"
          }
        }
      }
    }
  }
}

We need the type on these fields to be double as specified in the number documentation.

When those fields are configured as double insert using the data in @DSuveges' example works correctly:

PUT test_evidence_gen/
{
  "mappings": {
    "properties": {
      "oddsRatio": {
        "type": "double"
      },
      "oddsRatioConfidenceIntervalUpper": {
        "type": "double"
      },
      "oddsRatioConfidenceIntervalLower": {
        "type": "double"
      }
    }
  }
}

POST /test_evidence_gen/_doc/
{
  "oddsRatio": "3.87053e+33",
  "oddsRatioConfidenceIntervalUpper": "1.71153e+28", 
  "oddsRatioConfidenceIntervalLower": "8.75299e+38"
}

GET test_evidence_gen/_search
{
  "query": {
    "match_all": {}
  }
}

The document is created as expected and is retrievable.

I'm making a PR now @mbdebian to resolve this for the next release.

opentargets / issues

Missing Genetics Portal evidence on Platform evidence page #1866