ramiromagno / gwasrapidd

gwasrapidd: an R package to query, download and wrangle GWAS Catalog data
https://rmagno.eu/gwasrapidd/
Other
89 stars 15 forks source link

The associations number obtained by "gwasrapidd" differs extremly from obtained in GWAS Catalog #43

Closed shenlan17 closed 10 months ago

shenlan17 commented 10 months ago

Hi, Using the code as follows, only 182 associations obtained my_associations <- get_associations(efo_id = "EFO_0005140") but in the GWAS Catalog web. it shows more than 7000 associations

ramiromagno commented 10 months ago

Hi @shenlan17,

The difference stems from the fact that the GWAS Catalog Web interface automatically presents associations directly related to EFO_0005140 trait and its child traits.

If you uncheck the box "include child trait data", then the number of associations is also 182 associations (as with gwasrapidd).

bitmap

ramiromagno commented 10 months ago

If you run get_associations() with EFO_0005140 and its child terms you get actually more variants: 8087.

library(gwasrapidd)

all_traits <- get_traits()
child_traits <- gwasrapidd::get_child_efo("EFO_0005140")$EFO_0005140
child_traits_in_gwas_cat <- all_traits[child_traits]

my_associations <- get_associations(efo_id = c("EFO_0005140", child_traits_in_gwas_cat@traits$efo_id))

n(my_associations)
#> [1] 8087

my_associations
#> An object of class "associations"
#> Slot "associations":
#> # A tibble: 8,087 × 18
#>    association_id pvalue pvalue_description pvalue_mantissa pvalue_exponent
#>    <chr>           <dbl> <chr>                        <int>           <int>
#>  1 36071897       1e- 11 <NA>                             1             -11
#>  2 36071902       1e-  7 <NA>                             1              -7
#>  3 36071907       3e- 18 <NA>                             3             -18
#>  4 36071911       2e-103 <NA>                             2            -103
#>  5 36071916       4e-  8 <NA>                             4              -8
#>  6 36071920       3e-  9 <NA>                             3              -9
#>  7 36071924       1e- 13 <NA>                             1             -13
#>  8 36071929       3e-  8 <NA>                             3              -8
#>  9 36071934       2e-  9 <NA>                             2              -9
#> 10 36071938       5e- 14 <NA>                             5             -14
#> # ℹ 8,077 more rows
#> # ℹ 13 more variables: multiple_snp_haplotype <lgl>, snp_interaction <lgl>,
#> #   snp_type <chr>, risk_frequency <dbl>, standard_error <dbl>, range <chr>,
#> #   or_per_copy_number <dbl>, beta_number <dbl>, beta_unit <chr>,
#> #   beta_direction <chr>, beta_description <chr>, last_mapping_date <dttm>,
#> #   last_update_date <dttm>
#> 
#> Slot "loci":
#> # A tibble: 8,089 × 4
#>    association_id locus_id haplotype_snp_count description   
#>    <chr>             <int>               <int> <chr>         
#>  1 36071897              1                  NA Single variant
#>  2 36071902              1                  NA Single variant
#>  3 36071907              1                  NA Single variant
#>  4 36071911              1                  NA Single variant
#>  5 36071916              1                  NA Single variant
#>  6 36071920              1                  NA Single variant
#>  7 36071924              1                  NA Single variant
#>  8 36071929              1                  NA Single variant
#>  9 36071934              1                  NA Single variant
#> 10 36071938              1                  NA Single variant
#> # ℹ 8,079 more rows
#> 
#> Slot "risk_alleles":
#> # A tibble: 8,121 × 7
#>    association_id locus_id variant_id risk_allele risk_frequency genome_wide
#>    <chr>             <int> <chr>      <chr>                <dbl> <lgl>      
#>  1 36071897              1 rs10797431 <NA>                    NA FALSE      
#>  2 36071902              1 rs72920202 <NA>                    NA FALSE      
#>  3 36071907              1 rs10494079 <NA>                    NA FALSE      
#>  4 36071911              1 rs2476601  <NA>                    NA FALSE      
#>  5 36071916              1 rs1800601  <NA>                    NA FALSE      
#>  6 36071920              1 rs11675342 <NA>                    NA FALSE      
#>  7 36071924              1 rs1534430  <NA>                    NA FALSE      
#>  8 36071929              1 rs67927699 <NA>                    NA FALSE      
#>  9 36071934              1 rs5865     <NA>                    NA FALSE      
#> 10 36071938              1 rs2075302  <NA>                    NA FALSE      
#> # ℹ 8,111 more rows
#> # ℹ 1 more variable: limited_list <lgl>
#> 
#> Slot "genes":
#> # A tibble: 8,563 × 3
#>    association_id locus_id gene_name
#>    <chr>             <int> <chr>    
#>  1 63818111              1 <NA>     
#>  2 55992578              1 MMEL1    
#>  3 55992585              1 PADI4    
#>  4 55992592              1 PTPN22   
#>  5 55992599              1 PTPN22   
#>  6 55992606              1 FASLG    
#>  7 55992613              1 AFF3     
#>  8 55992620              1 ITGA4    
#>  9 55992627              1 NAB1     
#> 10 55992634              1 STAT4    
#> # ℹ 8,553 more rows
#> 
#> Slot "ensembl_ids":
#> # A tibble: 10,815 × 4
#>    association_id locus_id gene_name ensembl_id     
#>    <chr>             <int> <chr>     <chr>          
#>  1 63818111              1 <NA>      <NA>           
#>  2 55992578              1 MMEL1     ENSG00000142606
#>  3 55992578              1 MMEL1     ENSG00000277131
#>  4 55992585              1 PADI4     ENSG00000159339
#>  5 55992585              1 PADI4     ENSG00000280908
#>  6 55992592              1 PTPN22    ENSG00000134242
#>  7 55992599              1 PTPN22    ENSG00000134242
#>  8 55992606              1 FASLG     ENSG00000117560
#>  9 55992613              1 AFF3      ENSG00000144218
#> 10 55992620              1 ITGA4     ENSG00000115232
#> # ℹ 10,805 more rows
#> 
#> Slot "entrez_ids":
#> # A tibble: 8,563 × 4
#>    association_id locus_id gene_name entrez_id
#>    <chr>             <int> <chr>     <chr>    
#>  1 63818111              1 <NA>      <NA>     
#>  2 55992578              1 MMEL1     79258    
#>  3 55992585              1 PADI4     23569    
#>  4 55992592              1 PTPN22    26191    
#>  5 55992599              1 PTPN22    26191    
#>  6 55992606              1 FASLG     356      
#>  7 55992613              1 AFF3      3899     
#>  8 55992620              1 ITGA4     3676     
#>  9 55992627              1 NAB1      4664     
#> 10 55992634              1 STAT4     6775     
#> # ℹ 8,553 more rows

Created on 2023-10-18 with reprex v2.0.2

ramiromagno commented 10 months ago

Hi @shenlan17:

Did my replies address your question?

shenlan17 commented 10 months ago

@ramiromagno yes, Thank you very much for your help!