moj-analytical-services / rshiny-template

Template RShiny project
5 stars 11 forks source link

dplyr getting confused in deployed environment #14

Open Thomas-Hirsch opened 5 years ago

Thomas-Hirsch commented 5 years ago

A prime example of a heisenbug. I'm testing out an update of a shiny app, but am encountering a mad issue in which a function that previously worked, ceases to function correctly when the app is deployed with the current Conda Dockerfile. I'm using https://github.com/moj-analytical-services/shiny_testbed as a testbed (as the name implies).

The code in question is essentially as below:

Click to expand ```library(dplyr) library(lubridate) library(dplyr) dt_to_numeric <- function(dt) { 3600 * hour(dt) + 60 * minute(dt) + second(dt) } hours_minutes_string_to_numeric <- function(hm_string) { # 2010-01-01 is an arbitrary date because we're just interested in the time dt <- as.POSIXct(paste("2010-01-01", hm_string), tz = "UTC") dt_to_numeric(dt) } in_time_range <- function(datetime_column, start_time, end_time) { between(dt_to_numeric(datetime_column), hours_minutes_string_to_numeric(start_time), hours_minutes_string_to_numeric(end_time) - 600) # end time should be exclusive } filter_time_range <- function(df, start_time, end_time) { df %>% filter(in_time_range(obs_datetime, start_time, end_time)) } df <- tibble::tribble( ~sensor_value, ~obs_datetime, ~survey_device_id, 0L, "2019-06-30 16:30:00", "222621", 0L, "2019-06-30 16:40:00", "222621", 0L, "2019-06-30 16:50:00", "222621", 0L, "2019-06-30 17:00:00", "222621", 0L, "2019-06-30 17:10:00", "222621", 0L, "2019-06-30 17:20:00", "222621") print(df) df %>% filter_time_range("09:00", "17:00") %>% print() ```

Essentially the function filters the obs_datetime field by a given time window. In this example, the last two lines print out the example dataset, then the filtered dataset with the last 3 rows filtered out. This has been working as expected for a long time, but now when I deploy it, the in_time_range function errors out by saying that the object obs_datetime not found. It appears that there's some breakdown in the non-standard evaluation in the filter command.

All of which points to what is happening, but not why. The only change of note is the Dockerfile, and the shift to conda. I thought it might be dplyr v0.8.2 at fault, but downgrading to v0.7.8 (used in production) didn't fix anything.

r4vi commented 5 years ago

hopefully someone who understands R better than me can chime in. I've run a minimal example of this code locally and can reproduce but I'm not sure how to fix it? one for @RobinL he's when back?

Thomas-Hirsch commented 5 years ago

A minor followup to this, to round out the picture: I spotted in the Kibana logs that right after dplyr gets loaded in with library(dplyr), a warning comes up:

info: The following object is masked _by_ '.GlobalEnv':
filter

Which explains why filter stops working (dplyr's filter itself overrides stats::filter, and is usually at the top of the environment food chain, which is why people generally don't usually have to add dplyr::), and may help pinpoint where it's going wrong. It's not clear to me why the .GlobalEnv is doing this in this specific way, but would probably need to have access to docker to debug