ramiromagno / gwasrapidd

gwasrapidd: an R package to query, download and wrangle GWAS Catalog data
https://rmagno.eu/gwasrapidd/
Other
89 stars 15 forks source link

Retrieve all genetic associations searching on reported trait and efo simultaneously #21

Closed mightyphil2000 closed 3 years ago

mightyphil2000 commented 3 years ago

I'd like to be able to retrieve genetic associations searching on reported trait and EFO at the same time (ie all genetic associations that match either the reported trait or EFO). However, get_associations does not allow searching on reported trait. A workaround is to first search on get_study using reported trait (which is invariant within gwas catalog study ID) and then retrieve the genetic associations using the study ID. However, this strategy cannot be used to identify associations for EFOs because EFO is not necessarily invariant within GWAS catalog study ID. Therefore, it seems that to identify genetic associations for reported trait and EFO you have to do the search in the follow three steps:

e.g. Step1. Identify genetic associations for reported trait trait="Plasma omega-6 polyunsaturated fatty acid levels (arachidonic acid)" gwas_studies<-gwasrapidd::get_studies(reported_trait=trait)
gwas_associations<-gwasrapidd::get_associations(study_id = gwas_studies@studies$study_id)

Step 2. Identify genetic associations for efo efo="arachidonic acid measurement" gwas_associations<-gwasrapidd::get_associations(efo_trait=efo)

Step3. Combine genetic associations from steps1 and step2 association_ids<-unique(c(gwas_associations1@associations$association_id,gwas_associations2@associations$association_id)) gwas_associations3<-gwasrapidd::get_associations(association_id=association_ids)

Or alternatively gwas_associations3<-unique(rbind(gwas_associations2@associations,gwas_associations1@associations))

Do you agree this is the best way to retrieve genetic associations for reported trait and efo?

ramiromagno commented 3 years ago

Hi Philip:

Indeed, I think your approach is indeed the best approach.

Take a look at my refactoring of your code below and how I use the union() function in a slightly different way:

library(gwasrapidd)

# Step1. Identify genetic associations for reported trait
trait="Plasma omega-6 polyunsaturated fatty acid levels (arachidonic acid)"
gwas_studies<-gwasrapidd::get_studies(reported_trait=trait)
gwas_associations1<-gwasrapidd::get_associations(study_id = gwas_studies@studies$study_id)

# Step 2. Identify genetic associations for efo
efo="arachidonic acid measurement"
gwas_associations2<-gwasrapidd::get_associations(efo_trait=efo)

# Step 3: by means of a function
# NB: We use `union()` to combine both associations objects, and not just the
# association ids (as you had in your example). This is advantageous as all
# tables in the associations objects are united in a coherent way (courtesy of
# gwasrapidd).
get_associations_by_trait <- function(reported_trait = NULL,
                                      efo_trait = NULL,
                                      verbose = FALSE,
                                      warnings = TRUE) {

  if(!is.null(reported_trait)) {
    gwas_studies <- gwasrapidd::get_studies(reported_trait = reported_trait)
    gwas_associations_by_reported_trait <- gwasrapidd::get_associations(study_id = gwas_studies@studies$study_id,
                                                                        verbose = verbose,
                                                                        warnings = warnings)
  }

  if(!is.null(efo_trait)) {
    gwas_associations_by_efo_trait <- gwasrapidd::get_associations(efo_trait = efo_trait,
                                                                   verbose = verbose,
                                                                   warnings = warnings)
  }

  if(!is.null(reported_trait) && !is.null(efo_trait))
    return(union(gwas_associations_by_reported_trait, gwas_associations_by_efo_trait))

  if(!is.null(reported_trait) && is.null(efo_trait))
    return(gwas_associations_by_reported_trait)

  if(!is.null(reported_trait) && is.null(efo_trait))
    return(gwas_associations_by_efo_trait)

  stop('At least either `reported_trait` or `efo_trait` must be specified.')
}

all_associations <- get_associations_by_trait(
  reported_trait = "Plasma omega-6 polyunsaturated fatty acid levels (arachidonic acid)",
  efo_trait = "arachidonic acid measurement"
)

n(gwas_associations1)
#> [1] 28
n(gwas_associations2)
#> [1] 22
n(all_associations)
#> [1] 30
mightyphil2000 commented 3 years ago

Thanks Ramiro!

By the way, I wanted to let you know that I'm developing a new package called CheckSumStats (https://github.com/MRCIEU/CheckSumStats), which calls gwasrapidd for some of it's tests. Is it ok if I include the above get_associations_by_trait function ?

mightyphil2000 commented 3 years ago

I also noticed a possible bug in the function. This bit is repeated twice. At least one should have !is.null for efo_trait and is.null for reported_trait.

 if(!is.null(reported_trait) && is.null(efo_trait))
    return(gwas_associations_by_reported_trait)

I've modified the function to also work with efo_id.


get_associations_by_trait <- function(reported_trait = NULL,efo_trait = NULL,efo_id=NULL,verbose = FALSE,warnings = TRUE) 
{

    if(!is.null(reported_trait)) 
    {
        gwas_studies <- gwasrapidd::get_studies(reported_trait = reported_trait)
        gwas_associations_by_reported_trait <- gwasrapidd::get_associations(study_id = gwas_studies@studies$study_id,verbose = verbose,warnings = warnings)
    }

    if(!is.null(efo_trait)) 
    {
        gwas_associations_by_efo_trait <- gwasrapidd::get_associations(efo_trait = efo_trait,verbose = verbose,warnings = warnings)
    }

    if(!is.null(efo_id)) 
    {
        gwas_associations_by_efo_id <- gwasrapidd::get_associations(efo_id = efo_id,verbose = verbose,warnings = warnings)
    }

    if(!is.null(reported_trait) && !is.null(efo_trait))
        return(gwasrapidd::union(gwas_associations_by_reported_trait, gwas_associations_by_efo_trait))

    if(!is.null(reported_trait) && !is.null(efo_id) && is.null(efo_trait))
        return(gwasrapidd::union(gwas_associations_by_reported_trait, gwas_associations_by_efo_id))

    if(!is.null(reported_trait) && is.null(efo_trait) && is.null(efo_id))
        return(gwas_associations_by_reported_trait)

    if(is.null(reported_trait) && !is.null(efo_trait))
        return(gwas_associations_by_efo_trait)

    if(is.null(reported_trait) && is.null(efo_trait) && !is.null(efo_id))
        return(gwas_associations_by_efo_id)

    stop('At least either `reported_trait`, `efo_trait` or `efo_id` must be specified.')
}

all_associations <- get_associations_by_trait(
      reported_trait = "Plasma omega-6 polyunsaturated fatty acid levels (arachidonic acid)",
      efo_trait = "arachidonic acid measurement",
      efo_id=NULL
)
> n(all_associations)
[1] 30

all_associations <- get_associations_by_trait(
      reported_trait = "Plasma omega-6 polyunsaturated fatty acid levels (arachidonic acid)",
      efo_trait = NULL,
      efo_id="EFO_0006808"
)
> n(all_associations)
[1] 30
ramiromagno commented 3 years ago

Hi Philip:

You are most welcome to use all code.

Well spotted that bug!

Interesting that package of yours. I will have a look at it later. Thanks!

ramiromagno commented 3 years ago

Hello Philip:

Is this issue resolved? May I close it?

mightyphil2000 commented 3 years ago

Yes thanks Ramiro!

Obtener Outlook para iOShttps://aka.ms/o0ukef


De: Ramiro Magno @.> Enviado: Friday, June 11, 2021 5:32:18 PM Para: ramiromagno/gwasrapidd @.> Cc: Philip Haycock @.>; Author @.> Asunto: Re: [ramiromagno/gwasrapidd] Retrieve all genetic associations searching on reported trait and efo simultaneously (#21)

Hello Philip:

Is this issue resolved? May I close it?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/ramiromagno/gwasrapidd/issues/21#issuecomment-859701942, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ACSDAPX4ASER3L4E6E7X2WDTSI3BFANCNFSM46IDCKXA.

ramiromagno commented 3 years ago

Cheers!