ramiromagno / gwasrapidd

gwasrapidd: an R package to query, download and wrangle GWAS Catalog data
https://rmagno.eu/gwasrapidd/
Other
89 stars 15 forks source link

Problems with ensembl_id #12

Closed abhiachoudhary closed 3 years ago

abhiachoudhary commented 3 years ago

Hi,

This seems like a great package but I haven't been able to use it. The direct installation of the package didn't work (because LEGACY variable in unnest function was being set to TRUE). So I manually sourced all the files and corrected this part in utils.R. The functions however still don't run. I keep getting the error of the form: "Column ensembl_id must be length x or y, not z" where x, y, z are numbers that change based on the argument (e.g. different gene_name in get_variants function).

What am I doing wrong? Help needed!!

ramiromagno commented 3 years ago

Hi @abhiachoudhary, could you please share the code you've tried that generates that error?

abhiachoudhary commented 3 years ago

The code is simply sourcing all the files in R, and then get_variants() function.

source("browser.R") source("class-associations.R") source("class-studies.R") source("class-traits.R") source("class-variants.R") source("data.R") source("ebi_server.R") source("generics.R") source("get_associations.R") source("get_metadata.R") source("get_studies.R") source("get_traits.R") source("get_variants.R") source("gwasrapidd-package.R") source("list_joins.R") source("missing.R") source("parse-associations.R") source("parse-studies.R") source("parse-traits.R") source("parse-utils.R") source("parse-variants.R") source("post-studies.R") source("post-traits.R") source("post-variants.R") source("recursive_apply.R") source("request.R") source("s4-utils.R") source("sure.R") source("utils-pipe.R") source("utils.R")

get_variants(gene_name = "PCSK9")

ramiromagno commented 3 years ago

I mean the code you were trying before coming up with this alternative solution.

Here it seems to run fine:

library(gwasrapidd)
get_variants(gene_name = "PCSK9")
#> An object of class "variants"
#> Slot "variants":
#> # A tibble: 37 x 7
#>    variant_id merged functional_class chromosome_name chromosome_posi…
#>    <chr>       <int> <chr>            <chr>                      <int>
#>  1 rs17111652      0 intron_variant   1                       55124792
#>  2 rs11591147      0 missense_variant 1                       55039974
#>  3 rs287230        0 intron_variant   1                       55194619
#>  4 rs7712236…      0 intron_variant   1                       55158391
#>  5 rs2495504       0 intergenic_vari… 1                       55020123
#>  6 rs11800243      0 intron_variant   1                       55052420
#>  7 rs28362261      0 missense_variant 1                       55058129
#>  8 rs1399021…      0 3_prime_UTR_var… 1                       55012310
#>  9 rs72646508      0 missense_variant 1                       55052749
#> 10 rs2149039       0 3_prime_UTR_var… 1                       55014701
#> # … with 27 more rows, and 2 more variables: chromosome_region <chr>,
#> #   last_update_date <dttm>
#> 
#> Slot "genomic_contexts":
#> # A tibble: 424 x 12
#>    variant_id gene_name chromosome_name chromosome_posi… distance is_mapped_gene
#>    <chr>      <chr>     <chr>                      <int>    <int> <lgl>         
#>  1 rs17111652 GYG1P3    1                       55124792    97587 FALSE         
#>  2 rs17111652 PCSK9     1                       55124792    59939 FALSE         
#>  3 rs17111652 USP24     1                       55124792        0 FALSE         
#>  4 rs17111652 MIR4422HG 1                       55124792    92853 FALSE         
#>  5 rs17111652 USP24     1                       55124792        0 TRUE          
#>  6 rs17111652 GYG1P3    1                       55124792    97487 FALSE         
#>  7 rs17111652 PCSK9     1                       55124792    59940 FALSE         
#>  8 rs17111652 MIR4422HG 1                       55124792    93069 FALSE         
#>  9 rs17111652 LOC10050… 1                       55124792    90616 FALSE         
#> 10 rs11591147 AL590440… 1                       55039974    59510 FALSE         
#> # … with 414 more rows, and 6 more variables: is_closest_gene <lgl>,
#> #   is_intergenic <lgl>, is_upstream <lgl>, is_downstream <lgl>, source <chr>,
#> #   mapping_method <chr>
#> 
#> Slot "ensembl_ids":
#> # A tibble: 209 x 3
#>    variant_id gene_name  ensembl_id     
#>    <chr>      <chr>      <chr>          
#>  1 rs17111652 GYG1P3     ENSG00000231095
#>  2 rs17111652 PCSK9      ENSG00000169174
#>  3 rs17111652 USP24      ENSG00000162402
#>  4 rs17111652 MIR4422HG  ENSG00000231090
#>  5 rs11591147 AL590440.2 ENSG00000284601
#>  6 rs11591147 USP24      ENSG00000162402
#>  7 rs11591147 PCSK9      ENSG00000169174
#>  8 rs11591147 BSND       ENSG00000162399
#>  9 rs11591147 TMEM61     ENSG00000143001
#> 10 rs11591147 AL590440.1 ENSG00000233271
#> # … with 199 more rows
#> 
#> Slot "entrez_ids":
#> # A tibble: 215 x 3
#>    variant_id gene_name    entrez_id
#>    <chr>      <chr>        <chr>    
#>  1 rs17111652 GYG1P3       645506   
#>  2 rs17111652 PCSK9        255738   
#>  3 rs17111652 USP24        23358    
#>  4 rs17111652 MIR4422HG    109729135
#>  5 rs17111652 LOC100507634 100507634
#>  6 rs11591147 USP24        23358    
#>  7 rs11591147 PCSK9        255738   
#>  8 rs11591147 LOC105378736 105378736
#>  9 rs11591147 TRNAK-CUU    107985760
#> 10 rs11591147 BSND         7809     
#> # … with 205 more rows

Created on 2020-12-02 by the reprex package (v0.3.0)

What is the version of your tidyr R package?

abhiachoudhary commented 3 years ago

library(gwasrapidd) get_variants(gene_name = "PCSK9") Error: 'unnest_legacy' is not an exported object from 'namespace:tidyr'

I am using tidyr 0.8.1 version.

ramiromagno commented 3 years ago

Hi @abhiachoudhary ,

Thank you for reporting this, this is a bug indeed. I will fix it quickly.

ramiromagno commented 3 years ago

@abhiachoudhary : would you be okay with updating your tidyr version package? I am planning on making gwasrapidd dependent on tidyr (> 0.8.99).

abhiachoudhary commented 3 years ago

Yes, of course!

ramiromagno commented 3 years ago

Okay, nice. Then it should suffice to reinstall gwasrapidd.

Please try again:

remotes::install_github("ramiromagno/gwasrapidd")

and let me know if it is working now as expected.

abhiachoudhary commented 3 years ago

I removed and reinstalled but now I am getting the same error as before when I was sourcing the R files manually.

gwasrapidd removed first

remotes::install_github("ramiromagno/gwasrapidd") Downloading GitHub repo ramiromagno/gwasrapidd@master Installing package into ‘C:/Users/abhishekc/Documents/R/win-library/3.5’ (as ‘lib’ is unspecified)

  • installing source package 'gwasrapidd' ... R data * moving datasets to lazyload DB inst byte-compile and prepare package for lazy loading * help installing help indices * copying figures building package indices installing vignettes testing if installed package can be loaded
  • DONE (gwasrapidd) In R CMD INSTALL

library(gwasrapidd) get_variants(gene_name = "PCSK9") Error: Column ensembl_id must be length 1 or 9, not 8

ramiromagno commented 3 years ago

Have you restarted your R session? Could be a problem of still having the old version of the package loaded. Please try to quit your R session and start anew. Then run library(gwasrapidd), and then your example code.

And make sure that packageVersion('gwasrapidd') returns '0.99.9'.

abhiachoudhary commented 3 years ago

Yes the gwasrapidd version is correct and I do restart each time. I am realizing this may be due to other older version of other packages... will give an update soon

ramiromagno commented 3 years ago

@abhiachoudhary: any news?

abhiachoudhary commented 3 years ago

While typing this I saw your message :) The command get_variant() works now. The problem was my version of R was not up-to-date which led to older versions being used of many packages that your code requires. Thank you for your help!

On an unrelated note, and not an issue but just something I may have overlooked in the documentation, how do I use the functions like get_variantsby...(), and get_associationsby...() etc.? I see them in the help section but they don't load with the package installation (they're not in the namespace file or in the list of available functions from lsf.str). Calling them throws an error.

ramiromagno commented 3 years ago

You don't need to use the functions get_variants_by_...() or get_associations_by_...(). These functions are internal to the package and therefore not exposed to the user. They allow you to retrieve data by a single criterion only.

The four main functions: get_studies(), get_associations(), get_variants(), and get_traits() do it all, i.e., they allow you to retrieve data by a combination of available criteria (i.e., as permitted by the GWAS Catalog REST API), instead of a single criterion. In other words, the four main functions build on top of these others. So you should never really need them, unless perhaps for troubleshooting.

And yes, these functions are documented, though they are internal only.

abhiachoudhary commented 3 years ago

OK. For some reason, I thought there are additional options in by functions but this makes sense. Thanks!