Closed grantmcdermott closed 5 years ago
Ah, I think I've figured a workaround using !! build_sql()
.
Assuming that you have already connected to the global-fishing-watch.global_footprint_of_fisheries
dataset, the following dplyr code replicates the pure SQL query above:
effort <- tbl(con, "fishing_effort")
effort %>%
filter(
!!build_sql("_PARTITIONTIME") >= "2016-01-01 00:00:00",
!!build_sql("_PARTITIONTIME") < "2017-01-01 00:00:00"
) %>%
group_by(flag) %>%
summarise(total_fishing_hours = sum(fishing_hours)) %>%
arrange(desc(total_fishing_hours)) %>%
collect()
It would still be nice if there were a "native" dplyr way to do this, but I'm happy to close this if you are.
You didn't provide a reprex, but I'm almost certain that this should work:
effort %>%
filter(
`_PARTITIONTIME` >= "2016-01-01 00:00:00",
`_PARTITIONTIME` < "2017-01-01 00:00:00"
) %>%
group_by(flag) %>%
summarise(total_fishing_hours = sum(fishing_hours)) %>%
arrange(desc(total_fishing_hours)) %>%
collect()
Many BigQuery tables are date-partitioned, This allows for cheap querying from the BQ console, since users can specify the
_PARTITIONTIME
pseudo column to filter for a date or a range of dates. Here's an example from the Global Fishing Watch project:Is it possible to support this kind of partitioning in in
bigrquery
/dplyr
? I've tried to hack the translation using !! but my tidyeval skills failed me.Thanks for considering.