opentargets / issues

Issue tracker for Open Targets Platform and Open Targets Genetics Portal
https://platform.opentargets.org https://genetics.opentargets.org
Apache License 2.0
12 stars 2 forks source link

QC flags for GWAS Catalog curated associations are not properly assigned #3192

Closed DSuveges closed 5 months ago

DSuveges commented 5 months ago

After LD clumping the curated GWAS Catalog association set, the distribution of the quality control flags looks as this:


+-----------------------------------------------------------------------------------------------------------------------------------------------------------+------+
|qualityControls                                                                                                                                            |count |
+-----------------------------------------------------------------------------------------------------------------------------------------------------------+------+
|[Subsignificant p-value, Explained by a more significant variant in high LD (clumped)]                                                                     |10928 |
|[Composite association, Subsignificant p-value, Incomplete genomic mapping, Variant inconsistency, No mapping in GnomAd, Variant not found in LD reference]|159   |
|[Subsignificant p-value, Variant not found in LD reference]                                                                                                |5486  |
|[Incomplete genomic mapping, Variant inconsistency, No mapping in GnomAd, Variant not found in LD reference]                                               |30546 |
|[Variant not found in LD reference]                                                                                                                        |17338 |
|[]                                                                                                                                                         |302753|
|[Composite association, Subsignificant p-value, Variant inconsistency]                                                                                     |273   |
|[Subsignificant p-value, Palindrome alleles - cannot harmonize, Variant not found in LD reference]                                                         |710   |
|[Composite association, Incomplete genomic mapping, Variant inconsistency, No mapping in GnomAd, Variant not found in LD reference]                        |344   |
|[Subsignificant p-value, Palindrome alleles - cannot harmonize, Explained by a more significant variant in high LD (clumped)]                              |1539  |
|[Subsignificant p-value, Incomplete genomic mapping, Variant inconsistency, No mapping in GnomAd, Variant not found in LD reference]                       |8890  |
|[Palindrome alleles - cannot harmonize, Variant not found in LD reference]                                                                                 |2404  |
|[Subsignificant p-value, No mapping in GnomAd, Variant not found in LD reference]                                                                          |215   |
|[Palindrome alleles - cannot harmonize]                                                                                                                    |44617 |
|[Composite association, Variant inconsistency, Explained by a more significant variant in high LD (clumped)]                                               |323   |
|[Subsignificant p-value, Palindrome alleles - cannot harmonize]                                                                                            |9075  |
|[Composite association, Variant inconsistency]                                                                                                             |525   |
|[Composite association, Subsignificant p-value, Variant inconsistency, Explained by a more significant variant in high LD (clumped)]                       |152   |
|[No mapping in GnomAd, Variant not found in LD reference]                                                                                                  |754   |
|[Explained by a more significant variant in high LD (clumped)]                                                                                             |37906 |
+-----------------------------------------------------------------------------------------------------------------------------------------------------------+------+

There are QC checks upstreams that already invalidates associations eg. No mapping in GnomAd or Composite association. However at later stages, these associations are seemingly considered and got further flags eg. Explained by a more significant variant in high LD (clumped) or Variant not found in LD reference. I think this makes not much sense and once an associations is flagged they should be omitted from downstream processes/flags.

DSuveges commented 5 months ago

Based on a discussion with @ireneisdoomed and @d0choa, this is not an issue.