huizezhang-sherry / cubble

A tidy structure for spatio-temporal vector data
https://huizezhang-sherry.github.io/cubble/
Other
55 stars 9 forks source link

For the automatic matching can we choose which label to use #20

Closed dicook closed 7 months ago

dicook commented 10 months ago
covid <- read_csv("https://raw.githubusercontent.com/numbats/eda/master/data/melb_lga_covid.csv") %>%
  mutate(Buloke = as.numeric(ifelse(Buloke == "null", "0", Buloke))) %>%
   mutate(Hindmarsh = as.numeric(ifelse(Hindmarsh == "null", "0", Hindmarsh))) %>%
   mutate(Towong = as.numeric(ifelse(Towong == "null", "0", Towong))) %>%
  pivot_longer(cols = Alpine:Yarriambiack, names_to="NAME", values_to="cases") %>%
  mutate(Date = ydm(paste0("2020/",Date))) %>%
  mutate(cases=replace_na(cases, 0))

covid <- covid %>%
  group_by(NAME) %>%
  mutate(new_cases = cases - dplyr::lag(cases)) %>%
  na.omit()

lga <- strayr::read_absmap("lga2018") |>
  rename(lga = lga_name_2018) |>
  filter(state_name_2016 == "Victoria") 

covid <- covid %>%
  select(-cases) %>%
  rename(lga = NAME, date=Date, cases = new_cases) 
covid_ts <- as_tsibble(covid, key=lga, index=date)

covid_matching <- check_key(spatial = lga, temporal = covid_ts)

lga <- lga %>% 
  mutate(lga = ifelse(lga == "Colac-Otway (S)", "Colac Otway (S)", lga)) %>%
  filter(!(lga %in% covid_matching$others$spatial))

covid_matching <- check_key(spatial = lga, temporal = covid_ts)

covid_cb <- make_cubble(
  spatial = lga, temporal = covid_ts,
  potential_match = covid_matching, index = date)

It uses the key from the spatial data, but the temporal one would be better.

huizezhang-sherry commented 10 months ago

The key here is lga which is shared by the spatial and temporal data.

Since covid_ts is already a tsibble, the key and index will be taken from it.

dicook commented 10 months ago

It doesn't use the covid_ts key, it uses the spatial key. Done from the potential_match = covid_matching. With the potential_match bubble recognises that the two are likely the same, however, they are both different. It would be helpful for the user to specify which key of the two to use.

huizezhang-sherry commented 10 months ago

now there is a key_use argument in make_cubble(), accepting a string of either "spatial" or "temporal" (default to "temporal"), for specifying the key level to use in potential matching. See the two make_cubble() examples at the end of the reprex:

library(cubble)
#> 
#> Attaching package: 'cubble'
#> The following object is masked from 'package:stats':
#> 
#>     filter
library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
library(strayr)
library(sf)
#> Linking to GEOS 3.11.0, GDAL 3.5.3, PROJ 9.1.0; sf_use_s2() is TRUE
covid <- readr::read_csv("https://raw.githubusercontent.com/numbats/eda/master/data/melb_lga_covid.csv") |>
  mutate(Buloke = as.numeric(ifelse(Buloke == "null", "0", Buloke))) |>
  mutate(Hindmarsh = as.numeric(ifelse(Hindmarsh == "null", "0", Hindmarsh))) |>
  mutate(Towong = as.numeric(ifelse(Towong == "null", "0", Towong))) |>
  tidyr::pivot_longer(cols = Alpine:Yarriambiack, names_to="NAME", values_to="cases") |>
  mutate(Date = lubridate::ydm(paste0("2020/",Date))) |>
  mutate(cases= tidyr::replace_na(cases, 0))
#> Rows: 112 Columns: 80
#> ── Column specification ────────────────────────────────────────────────────────
#> Delimiter: ","
#> chr  (4): Date, Buloke, Hindmarsh, Towong
#> dbl (76): Alpine, Ararat, Ballarat, Banyule, Bass Coast, Baw Baw, Bayside, B...
#> 
#> ℹ Use `spec()` to retrieve the full column specification for this data.
#> ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

covid <- covid |>
  group_by(NAME) |>
  mutate(new_cases = cases - dplyr::lag(cases)) |>
  na.omit()

lga <- strayr::read_absmap("lga2018") |>
  rename(lga = lga_name_2018) |>
  dplyr::filter(state_name_2016 == "Victoria") 

covid <- covid |>
  select(-cases) |>
  rename(lga = NAME, date=Date, cases = new_cases) 
covid_ts <- tsibble::as_tsibble(covid, key=lga, index=date)

covid_matching <- check_key(spatial = lga, temporal = covid_ts)

lga <- lga |>
  mutate(lga = ifelse(lga == "Colac-Otway (S)", "Colac Otway (S)", lga)) |>
  filter(!(lga %in% covid_matching$others$spatial))

covid_matching <- check_key(spatial = lga, temporal = covid_ts)

make_cubble(
  spatial = lga, temporal = covid_ts,potential_match = covid_matching) |> 
  dplyr::pull(lga) |> head()
#> Warning: st_centroid assumes attributes are constant over geometries
#> [1] "Alpine"     "Ararat"     "Ballarat"   "Banyule"    "Bass Coast"
#> [6] "Baw Baw"

make_cubble(
  spatial = lga, temporal = covid_ts,
  potential_match = covid_matching, key_use = "spatial") |>  
  dplyr::pull(lga) |> head()
#> Warning: st_centroid assumes attributes are constant over geometries
#> [1] "Alpine (S)"     "Ararat (RC)"    "Ballarat (C)"   "Banyule (C)"   
#> [5] "Bass Coast (S)" "Baw Baw (S)"

Created on 2023-10-11 with reprex v2.0.2