nf-osi / nfportalutils

Utilities for NF Portal project and data management
https://nf-osi.github.io/nfportalutils/
MIT License
2 stars 2 forks source link

Summarize attributes as annotations #163

Closed anngvu closed 8 months ago

anngvu commented 8 months ago

There are two things to this:

check_fun <- function(values) {
  all(values %in% c("US", "DE", "CA")) # pretend only US, Germany, and Canada are considered valid
}
dataset_id <- "syn51809867"
summarize_attribute(summary_query = glue::glue("select group_concat(distinct GEO_location) as country from {dataset_id} where GEO_location is not null"),
                    attribute = "country", 
                    entity_id = dataset_id,
                    dry_run = FALSE, 
                    check_fun = check_fun)

Update: I think this is technically higher priority than the 3 other PRs outstanding.

allaway commented 8 months ago

but coming back to it I realized it should be generalized to work for datasets (touched upon this with @cconrad8 yesterday). So hopefully it makes sense what I wrote in the docs. Here is some test code for the dataset use case.

I totally agree with this. I hope that this can be eventually baked into synapse (https://sagebionetworks.jira.com/browse/PLFM-8197) but for now a util/job to handle this is great.

allaway commented 8 months ago

Hey @anngvu , I created a copy of your dataset (syn53182927) for testing, but when I run summarize_attribute, I'm running into the same issue that I ran into in #164, presumably caused by the get/set_annotations function:

> summarize_attribute(summary_query = glue::glue("select group_concat(distinct GEO_location) as country from {dataset_id} where GEO_location is not null"),
+                     attribute = "country", 
+                     entity_id = dataset_id,
+                     dry_run = FALSE, 
+                     check_fun = check_fun)
Error in py_call_impl(callable, call_args$unnamed, call_args$named) : 
  TypeError: Expected a synapseclient.Annotations object
Run `reticulate::py_last_error()` for details.
In addition: Warning message:
In entity_meta[attribute] <- meta[[entity]] :
  number of items to replace is not a multiple of replacement length

I'm wondering - what version of reticulate and synapseclient are you using? I wonder if this is something that I've messed up on my mac or if it's actually related to a broader reticulate and/or synapseclient change.

Here's my sessionInfo:

> sessionInfo()
R version 4.2.3 (2023-03-15)
Platform: x86_64-apple-darwin17.0 (64-bit)
Running under: macOS Big Sur 11.7.7

Matrix products: default
LAPACK: /Library/Frameworks/R.framework/Versions/4.2/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] synapser_1.0.0          nfportalutils_0.0.0.946 reticulate_1.34.0      

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.11       codetools_0.2-19  lattice_0.22-5    png_0.1-8         withr_2.5.2       grid_4.2.3       
 [7] jsonlite_1.8.7    magrittr_2.0.3    rlang_1.1.2       cli_3.6.1         data.table_1.14.8 rstudioapi_0.15.0
[13] Matrix_1.5-4.1    rjson_0.2.21      tools_4.2.3       glue_1.6.2        compiler_4.2.3   

I am using a conda environment with synapseclient 3.2.0.

allaway commented 8 months ago

If I get a chance today, I'll spin up a SC instance and test this there.

allaway commented 8 months ago

Per @anngvu lets try reticulate 1.28 - 1.34 might be the issue?

allaway commented 8 months ago

OK! After rolling back to reticulate 1.28, I have no issues running this function. Works great!

allaway commented 8 months ago

@jaybee84, can you clarify the issue you faced and how you fixed it? The issue I got seemed to be because reticulate was updated the latest version on my end - with an older version of reticulate, the issue went away. I'm just trying to figure out if we encountered the same thing or somthing different!

jaybee84 commented 8 months ago

Here is the particular error I got:

Error in py_call_impl(callable, call_args$unnamed, call_args$named) : 
  TypeError: Expected a synapseclient.Annotations object
Run `reticulate::py_last_error()` for details.
In addition: Warning message:
In entity_meta[attribute] <- meta[[entity]] :
  number of items to replace is not a multiple of replacement length

> reticulate::py_last_error()

Traceback (most recent call last):
  File "/Users/jineta/opt/miniconda3/envs/annotate/lib/python3.7/site-packages/synapseclient/client.py", line 1600, in set_annotations
    raise TypeError("Expected a synapseclient.Annotations object")
TypeError: Expected a synapseclient.Annotations object

My session info is here:

> sessionInfo()
R version 4.2.1 (2022-06-23)
Platform: x86_64-apple-darwin17.0 (64-bit)
Running under: macOS Monterey 12.6.3

Matrix products: default
LAPACK: /Library/Frameworks/R.framework/Versions/4.2/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods  
[7] base     

other attached packages:
[1] reticulate_1.32.0        synapser_1.1.0.119      
[3] nfportalutils_0.0.0.9411

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.11       pillar_1.9.0      compiler_4.2.1   
 [4] tools_4.2.1       digest_0.6.33     jsonlite_1.8.7   
 [7] evaluate_0.21     lifecycle_1.0.3   tibble_3.2.1     
[10] lattice_0.20-45   pkgconfig_2.0.3   png_0.1-8        
[13] rlang_1.1.1       Matrix_1.5-3      cli_3.6.1        
[16] rstudioapi_0.14   yaml_2.3.7        xfun_0.39        
[19] fastmap_1.1.1     withr_2.5.1       dplyr_1.1.3      
[22] knitr_1.42        generics_0.1.3    vctrs_0.6.3      
[25] grid_4.2.1        tidyselect_1.2.0  glue_1.6.2       
[28] data.table_1.14.8 R6_2.5.1          fansi_1.0.5      
[31] rmarkdown_2.21    purrr_1.0.2       magrittr_2.0.3   
[34] codetools_0.2-18  htmltools_0.5.6.1 utf8_1.2.3       
[37] rjson_0.2.21   

I havent had a chance to fix it yet, but was looking into it.

allaway commented 8 months ago

Ah, got it. Yes this is the same error i got, hopefully downgrading to 1.28 will fix it for you too!

On Thu, Dec 21, 2023, 13:55 Jineta Banerjee @.***> wrote:

Here is the particular error I got:

Error in py_call_impl(callable, call_args$unnamed, call_args$named) : TypeError: Expected a synapseclient.Annotations object Run reticulate::py_last_error() for details. In addition: Warning message: In entity_meta[attribute] <- meta[[entity]] : number of items to replace is not a multiple of replacement length

reticulate::py_last_error()

Traceback (most recent call last): File "/Users/jineta/opt/miniconda3/envs/annotate/lib/python3.7/site-packages/synapseclient/client.py", line 1600, in set_annotations raise TypeError("Expected a synapseclient.Annotations object") TypeError: Expected a synapseclient.Annotations object

My session info is here:

sessionInfo() R version 4.2.1 (2022-06-23) Platform: x86_64-apple-darwin17.0 (64-bit) Running under: macOS Monterey 12.6.3

Matrix products: default LAPACK: /Library/Frameworks/R.framework/Versions/4.2/Resources/lib/libRlapack.dylib

locale: [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages: [1] stats graphics grDevices utils datasets methods [7] base

other attached packages: [1] reticulate_1.32.0 synapser_1.1.0.119 [3] nfportalutils_0.0.0.9411

loaded via a namespace (and not attached): [1] Rcpp_1.0.11 pillar_1.9.0 compiler_4.2.1 [4] tools_4.2.1 digest_0.6.33 jsonlite_1.8.7 [7] evaluate_0.21 lifecycle_1.0.3 tibble_3.2.1 [10] lattice_0.20-45 pkgconfig_2.0.3 png_0.1-8 [13] rlang_1.1.1 Matrix_1.5-3 cli_3.6.1 [16] rstudioapi_0.14 yaml_2.3.7 xfun_0.39 [19] fastmap_1.1.1 withr_2.5.1 dplyr_1.1.3 [22] knitr_1.42 generics_0.1.3 vctrs_0.6.3 [25] grid_4.2.1 tidyselect_1.2.0 glue_1.6.2 [28] data.table_1.14.8 R6_2.5.1 fansi_1.0.5 [31] rmarkdown_2.21 purrr_1.0.2 magrittr_2.0.3 [34] codetools_0.2-18 htmltools_0.5.6.1 utf8_1.2.3 [37] rjson_0.2.21

I havent had a chance to fix it yet, but was looking into it.

— Reply to this email directly, view it on GitHub https://github.com/nf-osi/nfportalutils/pull/163#issuecomment-1866852034, or unsubscribe https://github.com/notifications/unsubscribe-auth/AE3WNSACB2K32A6VBVDSIGTYKSH4HAVCNFSM6AAAAABA3RQ5MWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQNRWHA2TEMBTGQ . You are receiving this because you were mentioned.Message ID: @.***>

anngvu commented 8 months ago

@jaybee84 OK, I think this is same problem Robert encountered. It should be a reticulate downgrade to 1.28, anything beyond that won't work, according to what I came across in scidev, etc.

Edit: We can mention this in the docs and/or pin the version in package description.

jaybee84 commented 8 months ago

Interesting. rolling down reticulate did not fix it for me:

R version 4.2.1 (2022-06-23)
Platform: x86_64-apple-darwin17.0 (64-bit)
Running under: macOS Monterey 12.6.3

Matrix products: default
LAPACK: /Library/Frameworks/R.framework/Versions/4.2/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods  
[7] base     

other attached packages:
[1] devtools_2.4.5           usethis_2.1.6           
[3] reticulate_1.28          synapser_1.1.0.119      
[5] nfportalutils_0.0.0.9411

loaded via a namespace (and not attached):
 [1] tidyselect_1.2.0  xfun_0.39         remotes_2.4.2    
 [4] purrr_1.0.2       lattice_0.20-45   vctrs_0.6.3      
 [7] generics_0.1.3    miniUI_0.1.1.1    htmltools_0.5.6.1
[10] yaml_2.3.7        utf8_1.2.3        rlang_1.1.1      
[13] pkgbuild_1.4.0    urlchecker_1.0.1  pillar_1.9.0     
[16] later_1.3.1       glue_1.6.2        withr_2.5.2      
[19] sessioninfo_1.2.2 lifecycle_1.0.3   stringr_1.5.0    
[22] htmlwidgets_1.6.2 codetools_0.2-18  evaluate_0.21    
[25] memoise_2.0.1     knitr_1.42        callr_3.7.3      
[28] fastmap_1.1.1     httpuv_1.6.11     ps_1.7.2         
[31] fansi_1.0.5       Rcpp_1.0.11       xtable_1.8-4     
[34] promises_1.2.1    cachem_1.0.8      pkgload_1.3.2    
[37] jsonlite_1.8.8    mime_0.12         fs_1.6.3         
[40] rjson_0.2.21      png_0.1-8         digest_0.6.33    
[43] stringi_1.7.12    processx_3.8.0    dplyr_1.1.3      
[46] shiny_1.7.5       grid_4.2.1        cli_3.6.1        
[49] tools_4.2.1       magrittr_2.0.3    tibble_3.2.1     
[52] profvis_0.3.7     crayon_1.5.2      pkgconfig_2.0.3  
[55] ellipsis_0.3.2    Matrix_1.5-3      prettyunits_1.2.0
[58] data.table_1.14.8 rmarkdown_2.21    rstudioapi_0.14  
[61] R6_2.5.1          compiler_4.2.1   
Error in py_call_impl(callable, call_args$unnamed, call_args$named) : 
  TypeError: Expected a synapseclient.Annotations object
Run `reticulate::py_last_error()` for details.
In addition: Warning message:
In entity_meta[attribute] <- meta[[entity]] :
  number of items to replace is not a multiple of replacement length

Then I noted that Robert has a different version of synapser. So I tried rolling my synapser version down to 1.0.0 but getting this error:

devtools::install_version("synapser", version = "1.0.0", repos = c("http://ran.synapse.org", "http://cran.fhcrc.org"))

Trying http://ran.synapse.org
Trying http://cran.fhcrc.org
Error in download_version_url(package, version, repos, type) : 
  version '1.0.0' is invalid for package 'synapser'
anngvu commented 8 months ago

Hmm, synapser version shouldn't matter much at all since we don't interface with it (yet) -- though I also have synapser_1.1.0.119.

I'll try to do a little more research to see if there is another unknown factor in the combination, but let me know if you do a total restart and it does suddenly work.

allaway commented 8 months ago

Discussed in standup - we'll merge this and then try and revisit the environment issues later when we get back.

jaybee84 commented 8 months ago

but let me know if you do a total restart and it does suddenly work.

@anngvu sorry for my late response, I missed this message earlier. I restarted my R session and am still getting the same error.