ramiromagno / gwasrapidd

gwasrapidd: an R package to query, download and wrangle GWAS Catalog data
https://rmagno.eu/gwasrapidd/
Other
89 stars 15 forks source link

get_associations using reported trait #19

Closed mightyphil2000 closed 3 years ago

mightyphil2000 commented 3 years ago

Is it possible to search for associations using the reported trait? I checked and it does not seem possible. The get_associations() function only allows one to search on efo_trait and efo_id but not reported_trait.

ramiromagno commented 3 years ago

Hi @mightyphil2000,

Thank you for your question.

Indeed, the REST API service does not provide an endpoint that can directly retrieve associations by reported trait. The only endpoints that allows searching directly by the authors' reported trait are the studies-related endpoints, so we need to get first the studies that are associated with your reported trait of interest, and then search for associations by those studies (e.g., by their studies' ids).

So here is an example of how to do it for the trait 'Blood metabolite levels':

library(gwasrapidd)

reported_trait_of_interest <- 'Blood metabolite levels'
studies_of_interest <- get_studies(reported_trait = reported_trait_of_interest)
assoc_of_interest <- get_associations(study_id = studies_of_interest@studies$study_id)

assoc_of_interest
#> An object of class "associations"
#> Slot "associations":
#> # A tibble: 279 x 17
#>    association_id  pvalue pvalue_description     pvalue_mantissa pvalue_exponent
#>    <chr>            <dbl> <chr>                            <int>           <int>
#>  1 42551          1 e- 19 (X-12244--N-acetylcar…               1             -19
#>  2 42552          7 e- 87 (X-08402)                            7             -87
#>  3 42554          3 e- 13 (X-13671)                            3             -13
#>  4 42555          1 e- 11 (asparagine)                         1             -11
#>  5 42556          2 e- 35 (isovalerylcarnitine)                2             -35
#>  6 42531          6.e-315 (X-11529)                            6            -315
#>  7 42532          1 e- 89 (X-11538)                            1             -89
#>  8 42558          1 e- 18 (1-palmitoylglyceroph…               1             -18
#>  9 42559          8 e- 12 (1-stearoylglyceropho…               8             -12
#> 10 42560          2 e- 88 (succinylcarnitine)                  2             -88
#> # … with 269 more rows, and 12 more variables: multiple_snp_haplotype <lgl>,
#> #   snp_interaction <lgl>, snp_type <chr>, standard_error <dbl>, range <chr>,
#> #   or_per_copy_number <dbl>, beta_number <dbl>, beta_unit <chr>,
#> #   beta_direction <chr>, beta_description <chr>, last_mapping_date <dttm>,
#> #   last_update_date <dttm>
#> 
#> Slot "loci":
#> # A tibble: 279 x 4
#>    association_id locus_id haplotype_snp_count description   
#>    <chr>             <int>               <int> <chr>         
#>  1 42551                 1                  NA Single variant
#>  2 42552                 1                  NA Single variant
#>  3 42554                 1                  NA Single variant
#>  4 42555                 1                  NA Single variant
#>  5 42556                 1                  NA Single variant
#>  6 42531                 1                  NA Single variant
#>  7 42532                 1                  NA Single variant
#>  8 42558                 1                  NA Single variant
#>  9 42559                 1                  NA Single variant
#> 10 42560                 1                  NA Single variant
#> # … with 269 more rows
#> 
#> Slot "risk_alleles":
#> # A tibble: 279 x 7
#>    association_id locus_id variant_id risk_allele risk_frequency genome_wide
#>    <chr>             <int> <chr>      <chr>                <dbl> <lgl>      
#>  1 42551                 1 rs9302065  A                       NA NA         
#>  2 42552                 1 rs7157785  T                       NA NA         
#>  3 42554                 1 rs2041073  T                       NA NA         
#>  4 42555                 1 rs4144027  T                       NA NA         
#>  5 42556                 1 rs9635324  A                       NA NA         
#>  6 42531                 1 rs4149056  T                       NA NA         
#>  7 42532                 1 rs1871395  A                       NA NA         
#>  8 42558                 1 rs2070895  A                       NA NA         
#>  9 42559                 1 rs588136   T                       NA NA         
#> 10 42560                 1 rs1472631  A                       NA NA         
#> # … with 269 more rows, and 1 more variable: limited_list <lgl>
#> 
#> Slot "genes":
#> # A tibble: 425 x 3
#>    association_id locus_id gene_name
#>    <chr>             <int> <chr>    
#>  1 42551                 1 ABCC4    
#>  2 42552                 1 SGPP1    
#>  3 42554                 1 HEATR4   
#>  4 42555                 1 ASPG     
#>  5 42556                 1 IVD      
#>  6 42531                 1 SLCO1B1  
#>  7 42532                 1 SLCO1B1  
#>  8 42558                 1 LIPC     
#>  9 42559                 1 LIPC     
#> 10 42560                 1 LACTB    
#> # … with 415 more rows
#> 
#> Slot "ensembl_ids":
#> # A tibble: 446 x 4
#>    association_id locus_id gene_name ensembl_id     
#>    <chr>             <int> <chr>     <chr>          
#>  1 42551                 1 ABCC4     ENSG00000125257
#>  2 42552                 1 SGPP1     ENSG00000126821
#>  3 42552                 1 SGPP1     ENSG00000285281
#>  4 42554                 1 HEATR4    ENSG00000187105
#>  5 42555                 1 ASPG      ENSG00000166183
#>  6 42556                 1 IVD       ENSG00000128928
#>  7 42531                 1 SLCO1B1   ENSG00000134538
#>  8 42532                 1 SLCO1B1   ENSG00000134538
#>  9 42558                 1 LIPC      ENSG00000166035
#> 10 42559                 1 LIPC      ENSG00000166035
#> # … with 436 more rows
#> 
#> Slot "entrez_ids":
#> # A tibble: 425 x 4
#>    association_id locus_id gene_name entrez_id
#>    <chr>             <int> <chr>     <chr>    
#>  1 42551                 1 ABCC4     10257    
#>  2 42552                 1 SGPP1     81537    
#>  3 42554                 1 HEATR4    399671   
#>  4 42555                 1 ASPG      374569   
#>  5 42556                 1 IVD       3712     
#>  6 42531                 1 SLCO1B1   10599    
#>  7 42532                 1 SLCO1B1   10599    
#>  8 42558                 1 LIPC      3990     
#>  9 42559                 1 LIPC      3990     
#> 10 42560                 1 LACTB     114294   
#> # … with 415 more rows

Let me know if this solves your problem, or if you need further help.

mightyphil2000 commented 3 years ago

Thanks Ramiro,

I think that solution works if reported trait is the same across associations within a study. A problem arises if reported trait is not consistent with study. For example, say I'm interested in trait X. I identify Study X by searching on trait X. But imagine Study X also investigated trait Y. Therefore get_associations on study ID will retrieve associations for trait X and trait Y but I am not interested in trait Y. I only want associations for trait X. What do you think?

mightyphil2000 commented 3 years ago

I just checked and actually I think your solution does work because it seems like reported trait is invariant within study ID. I made the mistake of thinking of study as study publication but actually we are referring to the GWAS catalog study IDs (starting with "GC..."), which vary within study publication. Can you confirm that reported trait is invariant within GWAS catalog study ID?

ramiromagno commented 3 years ago

Indeed, each GWAS Catalog study should be only associated with one reported trait. So, if the same publication investigated several traits, then will you likely have several study IDs in the catalog originating from that same publication.

I don't know your application of these searches, but make sure that you prefer the reported_trait over the EFO trait. As you might know, the reported trait is a trait description that uses original authors' own terms, whereas the EFO traits are a controlled vocabulary defined by the Experimental Factor Ontology (which have been assigned by the GWAS Catalog team).

ramiromagno commented 3 years ago

Hello Philip:

May I close this issue?

mightyphil2000 commented 3 years ago

Yes thanks!

Obtener Outlook para iOShttps://aka.ms/o0ukef


De: Ramiro Magno @.> Enviado: Friday, June 11, 2021 5:33:04 PM Para: ramiromagno/gwasrapidd @.> Cc: Philip Haycock @.>; Mention @.> Asunto: Re: [ramiromagno/gwasrapidd] get_associations using reported trait (#19)

Hello Philip:

May I close this issue?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/ramiromagno/gwasrapidd/issues/19#issuecomment-859702326, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ACSDAPSS53VIPUYMAJIZT7LTSI3EBANCNFSM46A6AAEA.

ramiromagno commented 3 years ago

Thanks!