opentargets / issues

Issue tracker for Open Targets Platform and Open Targets Genetics Portal
https://platform.opentargets.org https://genetics.opentargets.org
Apache License 2.0
12 stars 2 forks source link

Multiple distances for clumping and locus collection #3097

Closed DSuveges closed 9 months ago

DSuveges commented 9 months ago

Currently when distance based clumping is applied on a summary stats dataset the same distance can be applied ot get the surrounding single point associations for pattern based colocalization. However it is desired to make it more flexible eg. apply a +/-500kbp window based clumping and get locus within +/-250kbp distance around semi indices.

DSuveges commented 9 months ago

It works. When applying a double sized window to capture locus, we are getting double the amount of variants, while having the identical list of semi-indices:

+----------------+-----------+----------+------------------+
|variantId       |short_locus|long_locus|increase          |
+----------------+-----------+----------+------------------+
|10_63267850_A_T |342        |882       |2.5789473684210527|
|20_45916409_C_T |506        |1032      |2.039525691699605 |
|8_18415790_G_C  |959        |1904      |1.9854014598540146|
|16_30907166_C_G |111        |274       |2.4684684684684686|
|7_73520180_A_G  |142        |240       |1.6901408450704225|
|12_124002131_A_G|401        |844       |2.1047381546134662|
|12_57398797_C_T |188        |475       |2.526595744680851 |
|19_44911194_T_C |257        |675       |2.6264591439688716|
|2_164656581_T_C |454        |865       |1.9052863436123348|
|15_42391589_G_A |278        |768       |2.762589928057554 |
|6_31297713_T_C  |1970       |2909      |1.4766497461928934|
|13_73991363_A_G |667        |1311      |1.9655172413793103|
|8_10826419_G_C  |687        |1476      |2.148471615720524 |
|3_136207780_G_T |222        |444       |2.0               |
|2_226234464_C_T |553        |946       |1.7106690777576854|
|11_61802358_C_T |318        |652       |2.050314465408805 |
|8_125478730_A_T |479        |891       |1.860125260960334 |
|1_230169566_G_A |559        |1003      |1.7942754919499107|
|8_19986711_A_G  |667        |1430      |2.143928035982009 |
|16_56970977_G_A |490        |911       |1.8591836734693878|
|19_19296909_T_C |245        |505       |2.061224489795918 |
|4_87109109_G_T  |365        |760       |2.0821917808219177|
|2_21002409_C_T  |549        |1164      |2.120218579234973 |
|22_38150026_T_C |318        |602       |1.8930817610062893|
|6_31979683_G_T  |498        |1767      |3.5481927710843375|
|10_93079885_G_A |526        |1048      |1.9923954372623573|
|11_116778201_G_C|573        |1111      |1.9389179755671901|
|2_27508073_T_C  |218        |471       |2.1605504587155964|
|15_58438954_G_C |726        |1250      |1.721763085399449 |
|1_62560271_G_T  |314        |972       |3.0955414012738856|
|6_32543895_T_A  |1474       |2663      |1.8066485753052917|
|15_43953733_A_T |273        |393       |1.4395604395604396|
|5_56565959_C_T  |699        |1289      |1.844062947067239 |
|2_28121418_G_A  |370        |829       |2.2405405405405405|
|5_157052312_G_C |449        |1003      |2.2338530066815143|
|7_72664689_C_T  |169        |384       |2.272189349112426 |
+----------------+-----------+----------+------------------+