opentargets / issues

Issue tracker for Open Targets Platform and Open Targets Genetics Portal
https://platform.opentargets.org https://genetics.opentargets.org
Apache License 2.0
12 stars 2 forks source link

Handle credible sets mapped to chromosomes 23 and 24 #3613

Open d0choa opened 2 weeks ago

d0choa commented 2 weeks ago

466 credible sets from GWAS catalog contain chromosome 23 and 24 variants.

In [15]: cs.groupBy("chromosome").count().sort(f.col("count").desc()).show(30)
+----------+------+
|chromosome| count|
+----------+------+
|         1|235354|
|         2|184955|
|        19|174623|
|        17|158333|
|        11|152366|
|         7|138130|
|         3|133863|
|        12|132999|
|         5|124561|
|        16|124245|
|         6|107201|
|         4|105030|
|        10|103562|
|         8| 99298|
|         9| 93695|
|        15| 91418|
|        14| 83055|
|        22| 69152|
|        20| 63675|
|        13| 46702|
|        18| 42700|
|        21| 37254|
|         X| 31904|
|        23|   463|
|         Y|     4|
|        24|     3|
+----------+------+

Here are some examples. I suspect they come from the sumstats (and probably all from the same publication)

In [20]: cs.filter(f.col("chromosome") == "23").select("studyId", "finemappingMethod", "studyLocusId", f.size("locus"), "qualityControls").show(truncate = False)
+------------+-----------------+--------------------------------+-----------+-----------------------------------+
|studyId     |finemappingMethod|studyLocusId                    |size(locus)|qualityControls                    |
+------------+-----------------+--------------------------------+-----------+-----------------------------------+
|GCST90006356|pics             |0d3cd7214a6380f329e0c91f428076ce|1          |[Variant not found in LD reference]|
|GCST90003284|pics             |27044aed71dd5b3bbf843a4d27e2d01e|1          |[Variant not found in LD reference]|
|GCST90239652|pics             |5b304b0136ece0afe2851bbf0a7c9597|1          |[Variant not found in LD reference]|
|GCST90004386|pics             |6321ae974769f53eb6be2aa52d76acc2|1          |[Variant not found in LD reference]|
|GCST90239652|pics             |809e78221190014cd9b812c7fed5df28|1          |[Variant not found in LD reference]|
|GCST90003151|pics             |12f1d458c87c59f64d51b7a285af1fcf|1          |[Variant not found in LD reference]|
|GCST90003753|pics             |1cad992f50d014176b86b1624f989d74|1          |[Variant not found in LD reference]|
|GCST90239819|pics             |97af727672d2ee0eaeaa201916bdac1c|1          |[Variant not found in LD reference]|
|GCST90002601|pics             |c90a07f8c8234c691df025298943b0bf|1          |[Variant not found in LD reference]|
|GCST90320055|pics             |2421e1c51f1464a3fe66f8bcba4966c9|1          |[Variant not found in LD reference]|
|GCST90004553|pics             |294abd961d8b80dc4957aa7933686aad|1          |[Variant not found in LD reference]|
|GCST90003198|pics             |5135a9385bb39708e1bab4e453615ab9|1          |[Variant not found in LD reference]|
|GCST90004332|pics             |24d4e6772087976782363c878ada081d|1          |[Variant not found in LD reference]|
|GCST90239819|pics             |38b39bda86ae55c87c3b8503765fef4c|1          |[Variant not found in LD reference]|
|GCST90239652|pics             |3ce6b906274c8cdaa4171805aad46e01|1          |[Variant not found in LD reference]|
|GCST90239652|pics             |a3c8de1549c0201a85cada1db853e635|1          |[Variant not found in LD reference]|
|GCST90003083|pics             |0af1a5b1a9e7b800cb5b0cd84ae461e5|1          |[Variant not found in LD reference]|
|GCST90004193|pics             |4919205ce2ca12e93e3c55ab9fb0c84f|1          |[Variant not found in LD reference]|
|GCST90003083|pics             |52e6642d82e45009e283f56d16845cd0|1          |[Variant not found in LD reference]|
|GCST90239823|pics             |6f2b7bf51260af8536d37edac7d29083|1          |[Variant not found in LD reference]|
+------------+-----------------+--------------------------------+-----------+-----------------------------------+
only showing top 20 rows

The above credible sets crash the credible set page. I suspect the reason is that the FE does not contemplate a chromosome that might be in the 23 or 24th chromosome (probably for sorting variants?).

A few options: