ramiromagno / gwasrapidd

gwasrapidd: an R package to query, download and wrangle GWAS Catalog data
https://rmagno.eu/gwasrapidd/
Other
89 stars 15 forks source link

How to get non-unioned results when using a list as a parameter? #11

Closed MattCloward closed 3 years ago

MattCloward commented 3 years ago

Hello!

When using a get function, is it possible to non-unique results when using a list as a parameter? Here is what I am trying to do:

studyID = "GCST001718" # a study containing association ids belonging to 3 separate traits
associationsTibble <- get_associations(study_id = studyID)@associations # getting the associations from the study
association_ids <- associationsTibble[["association_id"]] # there are 7 associations in the study
#trying to add the result of get_traits using the association_id list as a new column gives an error because get functions only return unique values, even with the set_operation="intersection" parameter
combinedTable <- add_column(associationsTibble, trait = get_traits(association_id = association_ids, set_operation = "intersection")@traits) 
Error: New columns must be compatible with `.data`.
x New column has 3 rows.
i `.data` has 7 rows.

The set operation appears to work only when there are multiple parameters passed into a get function (ie: study_id and association_id). Is there anyway to keep all results when passing in a list as a parameter instead of just unique values?

Thanks!

MattCloward commented 3 years ago

I found a temporary solution, but if it means doing a query for every association_id in the database, this is a bad idea. Do you have any suggestions for improving this code so I don't do so many queries?

getTraitFromAssociationIDs <- function(ids) {
  traits = c()
  for (id in ids) {
    trait <- get_traits(association_id = id)@traits[["trait"]]
    if (length(trait) > 1) {
      trait <- paste(trait, collapse = '|') 
    }
    traits = c(traits, trait)
  }
  return(traits)
}

studyID = "GCST001718" 
associationsTibble <- get_associations(study_id = studyID)@associations 
association_ids <- associationsTibble[["association_id"]] 
combinedTable <- add_column(associationsTibble, trait = getTraitFromAssociationIDs(association_ids)) 
ramiromagno commented 3 years ago

Hi @MattCloward!

Thank you for using gwasrapidd. I hope you find it useful.

Regarding your question: unfortunately, the GWAS REST API service does require us to perform various queries, one for each association_id as you did. In the future, I will add a set of functions to assist the user in these cases.

I have rewritten your code in a way that I think is a bit more idiomatic. Hopefully it does what you meant. (I believe your question is a variation of FAQ 8)


    library(gwasrapidd)

    studyID = "GCST001718"
    associationsTibble <- get_associations(study_id = studyID)@associations

    association_ids <- associationsTibble[["association_id"]]
    names(association_ids) <- association_ids
    association_ids %>%
      purrr::map(~ get_traits(association_id = .x)@traits) %>%
      dplyr::bind_rows(.id = 'association_id') -> my_traits

    combinedTable <- dplyr::left_join(associationsTibble, my_traits, by = 'association_id')

    combinedTable[, c('association_id', 'efo_id', 'trait', 'or_per_copy_number', 'pvalue')]
    #> # A tibble: 11 x 5
    #>    association_id efo_id      trait                   or_per_copy_numb…   pvalue
    #>    <chr>          <chr>       <chr>                               <dbl>    <dbl>
    #>  1 25718          EFO_0000707 squamous cell carcinoma              1.14 3.00e- 6
    #>  2 25451          EFO_0001071 lung carcinoma                       1.15 2.00e- 6
    #>  3 25452          EFO_0000707 squamous cell carcinoma              1.18 5.00e- 9
    #>  4 25453          EFO_0000178 gastric carcinoma                    1.15 1.00e-12
    #>  5 25453          EFO_0001071 lung carcinoma                       1.15 1.00e-12
    #>  6 25453          EFO_0000707 squamous cell carcinoma              1.15 1.00e-12
    #>  7 25454          EFO_0001071 lung carcinoma                       1.17 2.00e- 8
    #>  8 25455          EFO_0000707 squamous cell carcinoma              1.14 1.00e- 6
    #>  9 25456          EFO_0000178 gastric carcinoma                    1.17 1.00e-16
    #> 10 25456          EFO_0001071 lung carcinoma                       1.17 1.00e-16
    #> 11 25456          EFO_0000707 squamous cell carcinoma              1.17 1.00e-16
MattCloward commented 3 years ago

Wow, this is amazing! Thanks for updating my code for me and thanks for keeping this library alive. It's unfortunate that it's the API of the GWAS catalog itself that's the limitation, but I we can work with that. Thanks again!

ramiromagno commented 3 years ago

You're welcome! May I close this issue?

MattCloward commented 3 years ago

Yes, thank you.