ropensci / tidync

NetCDF exploration and data extraction
https://docs.ropensci.org/tidync
90 stars 12 forks source link

Activate a variable specified by a string #95

Open matteodefelice opened 5 years ago

matteodefelice commented 5 years ago

I know this has been discussed here: https://github.com/ropensci/tidync/issues/26 but maybe the things have changed. I want to activate a variable specifying it in a string. I wasn't able to do that but instead I could with the name of the grid. This is my grid:

[18]   D5,D0    : OutputCommitted, OutputPower, OutputStorageInput, OutputStorageLevel, OutputSpillage, OutputHeat, OutputHeatSlack  

I do this:

var_name = 'OutputPower'
grid_name = 'D5,D0'

Well, this is what happens:

> sim_results %>% activate(grid_name) %>% active()
[1] "D5,D0"

With the name of the grid it works, instead with the variable name (associated to that grid) I get the first grid:

> sim_results %>% activate(var_name) %>% active()
[1] "D0,D3,D7"
wraseman commented 4 years ago

I agree @matteodefelice, I am having the same issue.

This is highlighted by the following example in the documentation:

tidync(filename) %>% activate("JULD") %>% 
  hyper_filter(N_PROF = N_PROF == 1) %>%
  hyper_tibble()

#> Class: tidync_data (list of tidync data arrays)
#> Variables (1): 'SCIENTIFIC_CALIB_DATE'
#> Dimension (4): DATE_TIME,N_PARAM,N_CALIB,N_PROF (14, 7, 1, 1)
#> Source: C:/Users/wraseman/Documents/R/win-library/3.6/tidync/extdata/argo/MD5903593_001.nc

The line Variables (1): should read Variables (1); 'JULD'

This is the result that one gets when using activate(JULD) instead of activate("JULD"):

tidync(filename) %>% activate(JULD) %>% 
  hyper_filter(N_PROF = N_PROF == 1) %>%
  hyper_tibble()

#> Class: tidync_data (list of tidync data arrays)
#> Variables (1): 'JULD'
#> Dimension (1): N_PROF (1)
#> Source: C:/Users/wraseman/Documents/R/win-library/3.6/tidync/extdata/argo/MD5903593_001.nc

When using the string input, it seems to default to grid [1]

mdsumner commented 4 years ago

So

This works:

filename <- system.file("extdata/argo/MD5903593_001.nc", mustWork = TRUE, package = "tidync")
tidync(filename) %>% activate(JULD)

But this does not:

filename <- system.file("extdata/argo/MD5903593_001.nc", mustWork = TRUE, package = "tidync")
tidync(filename) %>% activate("JULD")

A question is, should the second form also apply variable select (as per select_var), because the variable could be multiple var names. I'm inclined not to: https://github.com/ropensci/tidync/pull/100

If multiple variable names are given only the first is used.

mdsumner commented 4 years ago

Actually, I think I might not do this - another option could be

tidync(filename) %>% hyper_tibble(select_var = varname)

I think that would be better, though it doesn't work atm. I need to have a think and another look. Appreciate thoughts!

wraseman commented 4 years ago

@mdsumner, thanks for your response on this. I think it would be more intuitive to use an activate() function in both cases (rather than doing it through hyper_tibble(select_var = varname)). I think this could be achieved by creating a new function called activate_string() which handles variables passed as strings. This is consistent with passing aesthetic properties in ggplot:

source: https://ggplot2.tidyverse.org/reference/aes_.html

aes(mpg, wt, col = cyl)
#> Aesthetic mapping: 
#> * `x`      -> `mpg`
#> * `y`      -> `wt`
#> * `colour` -> `cyl`
aes_string("mpg", "wt", col = "cyl")
#> Aesthetic mapping: 
#> * `colour` -> `cyl`
#> * `x`      -> `mpg`
#> * `y`      -> `wt`

This change would mean the following code would give identical results:

# Passing a variable
filename <- system.file("extdata/argo/MD5903593_001.nc", mustWork = TRUE, package = "tidync")
tidync(filename) %>% activate(JULD)
# Passing the variable as a string
filename <- system.file("extdata/argo/MD5903593_001.nc", mustWork = TRUE, package = "tidync")
tidync(filename) %>% activate_string("JULD")
mdsumner commented 4 years ago

But what about the question, should activating via a variable string pass that in to select_var also? It's no problem to code it, but activate and select are doing different things and there are other implications I want to think about

wraseman commented 4 years ago

I was having a hard time understanding what you meant by that but I think I get it now.

If I understand it correctly, hyper_tibble() implicitly creates a tibble for whatever variable is passed to activate(). For instance, this first example should give the same tibble and the second:

tidync(filename) %>% activate(SCIENTIFIC_CALIB_COEFFICIENT)  %>% hyper_tibble()
tidync(filename) %>% activate(SCIENTIFIC_CALIB_COEFFICIENT)  %>% hyper_tibble(select_var = SCIENTIFIC_CALIB_COEFFICIENT)

However, since there are multiple variables in the active grid (SCIENTIFIC_CALIB_EQUATION, SCIENTIFIC_CALIB_COEFFICIENT, and SCIENTIFIC_CALIB_COMMENT), the user could specify the active grid with any of these variables but choose the variable using select_var like this:

# if the user wants to view data for "SCIENTIFIC_CALIB_EQUATION"
tidync(filename) %>% activate(SCIENTIFIC_CALIB_COEFFICIENT)  %>% hyper_tibble(select_var = SCIENTIFIC_CALIB_EQUATION)

If that is the case, then yes, I think that activating the grid using a variable string should also pass that information to hyper_tibble(). In that case, the user could do the following and get the same results:

tidync(filename) %>% activate("SCIENTIFIC_CALIB_COEFFICIENT")  %>% hyper_tibble()
tidync(filename) %>% activate("SCIENTIFIC_CALIB_COEFFICIENT")  %>% hyper_tibble(select_var = "SCIENTIFIC_CALIB_COEFFICIENT")

It does seem a bit odd to accept either a string or a variable for the same function, so that was why I thought about creating a separate function like activate_string() but then I see how that would lead to difficulties down the road.

I'm not sure if I answered your question, I'm still new to tidync, so my apologies if not!

@matteodefelice, do you have any thoughts?

mdsumner commented 4 years ago

Activate is for grids not variables. I just thought it was handy to pick out a grid via a nominal variable name, but I've always been uncomfortable about it.

I think I should write a bit about this in more detail

wraseman commented 4 years ago

I agree, I was a bit confused about using a variable to activate the grid when I learned about the activate() function. Let me know if you need any more thoughts!

matteodefelice commented 4 years ago

I am currently using (and developing) R code to analyse an extensive set of power system simulations which output has been saved in NetCDF. Each simulation has 27 different grids, in total ~80 different variables. I have developed some functions post-processing those outputs and I have tried to generalise as much as possible, that's why I needed the possibility to pass as a function argument the variable I needed to "extract". Currently I can generalise within a single grid, so if the field I need is stored in field_name I activate the grid and then I use dplyr::select using field_name and then rlang::sym when is needed. I don't like this, because if the grid name changes my code stops working because I need to manually encode the grid name in my functions. Maybe there is a better way to do this, however the string-based solution suggested by @wraseman looks nice. I plan to share my code as a R package to post-process the outputs of the open source power system model I am using (Dispa-SET, www.dispa-set.eu).

aodenweller commented 2 years ago

Hi there, I know this is a very old issue but I'm encountering the same problem as @matteodefelice. Have you figured out a workaround in the meantime? Thanks a lot!

mdsumner commented 2 years ago

I think I'll do this

I think that makes sense, because grids aren't identified in netcdf it's a bit of a pain and these weird names are a problem ;)

mdsumner commented 2 years ago

actually, this already works

@aodenweller can you show an example of what you want?

aodenweller commented 2 years ago

Activation works fine when I'm using an unquoted variable name, but not when I'm using the variable name stored as a string. I'm assuming this is due to what_name <- deparse(substitute(what)) in activate.R.

mdsumner commented 2 years ago

I think this works with select_var if that helps as a workaround, but, there are inconsistencies I'll try to fix 🙏

it's possible will be much simpler with rlang now, but still a bit uncomfortable conflating activate with var select so maybe a new function would be better

BarbaraRobson commented 1 year ago

I'm finding that if I use nc <- nc %>% activate('salt') I can then do nc %>% hyper_tibble('salt') but not nc %>%hyper_tibble('temperature'). I think this is as intended, but I hoped this would work: nc <- nc %>% activate('salt', select_var = c('salt','temperature')) Unfortunately, it doesn't.

To be able to access temperature as well as salt, I have to do the following, which is not obvious: nc <- tidync::tidync(input_file) %>% tidync::activate('salt') %>% tidync::activate(tidync::active(nc), select_var = c('salt','temperature') i.e., activate by variable name, then use active() to get the grid name, then activate again, this time by grid name.