In the data for credible sets and studies we have included a qualityControls column (accessible through API). The purpose of this field is to raise concerns/warnings about the data, but in all cases, we considered the problem insufficient to invalidate the data.
The next contains all the combinations of QC flags and their frequency (# of studies/credible sets)
Study index
In [8]: si.groupBy("qualityControls").count().sort(f.col("count").desc()).show(truncate = False)
+-----------------------------------------------------------------+-------+
|qualityControls |count |
+-----------------------------------------------------------------+-------+
|[] |1910521|
|[Harmonized summary statistics are not available or empty] |58934 |
|[The number of SNPs in the study is below the expected threshold]|1594 |
+-----------------------------------------------------------------+-------+
Credible sets
In [11]: cs.groupBy("qualityControls").count().sort(f.col("count").desc()).show(truncate = False)
+---------------------------------------------------------------------------------------------------------------------------------------+-------+
|qualityControls |count |
+---------------------------------------------------------------------------------------------------------------------------------------+-------+
|[Study locus with a sum of PIPs that not in the expected range [0.99,1]] |1616072|
|[] |454448 |
|[Study locus finemapped without in-sample LD reference] |291867 |
|[Study locus from curated top hit, Study has quality control flag(s)] |131146 |
|[Study has quality control flag(s)] |20385 |
|[LD block does not contain variants at the required R^2 threshold] |6510 |
|[Variant not found in LD reference] |5549 |
|[Study locus from curated top hit, LD block does not contain variants at the required R^2 threshold, Study has quality control flag(s)]|3620 |
|[Study locus from curated top hit, Variant not found in LD reference, Study has quality control flag(s)] |3029 |
|[Variant not found in LD reference, Study has quality control flag(s)] |1604 |
|[LD block does not contain variants at the required R^2 threshold, Study has quality control flag(s)] |315 |
+---------------------------------------------------------------------------------------------------------------------------------------+-------+
{
"data": {
"credibleSets": [
{
"studyLocusId": "e51d81927ea4e9cc503c33eff3062a6e",
"qualityControls": [
"Variant not found in LD reference",
"Study has quality control flag(s)"
]
}
]
}
}
It would be good to show this data consistently in credible set and study pages.
To start the brainstorming, I was thinking about a warning-like icon in the metadata section of each respective page displaying when the number of qualityControls is greater than 0. It could even contain a badge with the number of quality control flags. The on-hover interaction could contain a tooltip with all the QC flags listed using <li> elements.
In the data for credible sets and studies we have included a
qualityControls
column (accessible through API). The purpose of this field is to raise concerns/warnings about the data, but in all cases, we considered the problem insufficient to invalidate the data.The next contains all the combinations of QC flags and their frequency (# of studies/credible sets)
Study index
Credible sets
Example query:
Example response:
It would be good to show this data consistently in credible set and study pages.
To start the brainstorming, I was thinking about a warning-like icon in the metadata section of each respective page displaying when the number of
qualityControls
is greater than 0. It could even contain a badge with the number of quality control flags. The on-hover interaction could contain a tooltip with all the QC flags listed using<li>
elements.@buniello and @carcruz give it a thought