opentargets / issues

Issue tracker for Open Targets Platform and Open Targets Genetics Portal
https://platform.opentargets.org https://genetics.opentargets.org
Apache License 2.0
12 stars 2 forks source link

Consistently show `qualityControls` in study and credible set pages #3626

Open d0choa opened 6 days ago

d0choa commented 6 days ago

In the data for credible sets and studies we have included a qualityControls column (accessible through API). The purpose of this field is to raise concerns/warnings about the data, but in all cases, we considered the problem insufficient to invalidate the data.

The next contains all the combinations of QC flags and their frequency (# of studies/credible sets)

Study index

In [8]: si.groupBy("qualityControls").count().sort(f.col("count").desc()).show(truncate = False)
+-----------------------------------------------------------------+-------+
|qualityControls                                                  |count  |
+-----------------------------------------------------------------+-------+
|[]                                                               |1910521|
|[Harmonized summary statistics are not available or empty]       |58934  |
|[The number of SNPs in the study is below the expected threshold]|1594   |
+-----------------------------------------------------------------+-------+

Credible sets

In [11]: cs.groupBy("qualityControls").count().sort(f.col("count").desc()).show(truncate = False)
+---------------------------------------------------------------------------------------------------------------------------------------+-------+
|qualityControls                                                                                                                        |count  |
+---------------------------------------------------------------------------------------------------------------------------------------+-------+
|[Study locus with a sum of PIPs that not in the expected range [0.99,1]]                                                               |1616072|
|[]                                                                                                                                     |454448 |
|[Study locus finemapped without in-sample LD reference]                                                                                |291867 |
|[Study locus from curated top hit, Study has quality control flag(s)]                                                                  |131146 |
|[Study has quality control flag(s)]                                                                                                    |20385  |
|[LD block does not contain variants at the required R^2 threshold]                                                                     |6510   |
|[Variant not found in LD reference]                                                                                                    |5549   |
|[Study locus from curated top hit, LD block does not contain variants at the required R^2 threshold, Study has quality control flag(s)]|3620   |
|[Study locus from curated top hit, Variant not found in LD reference, Study has quality control flag(s)]                               |3029   |
|[Variant not found in LD reference, Study has quality control flag(s)]                                                                 |1604   |
|[LD block does not contain variants at the required R^2 threshold, Study has quality control flag(s)]                                  |315    |
+---------------------------------------------------------------------------------------------------------------------------------------+-------+

Example query:

query VariantsQuery {
  credibleSets(studyLocusIds: "e51d81927ea4e9cc503c33eff3062a6e") {
    studyLocusId
    qualityControls
  }
}

Example response:

{
  "data": {
    "credibleSets": [
      {
        "studyLocusId": "e51d81927ea4e9cc503c33eff3062a6e",
        "qualityControls": [
          "Variant not found in LD reference",
          "Study has quality control flag(s)"
        ]
      }
    ]
  }
}

It would be good to show this data consistently in credible set and study pages.

To start the brainstorming, I was thinking about a warning-like icon in the metadata section of each respective page displaying when the number of qualityControls is greater than 0. It could even contain a badge with the number of quality control flags. The on-hover interaction could contain a tooltip with all the QC flags listed using <li> elements.

@buniello and @carcruz give it a thought