Closed dicook closed 7 months ago
The key here is lga
which is shared by the spatial and temporal data.
Since covid_ts
is already a tsibble, the key and index will be taken from it.
It doesn't use the covid_ts key, it uses the spatial key. Done from the potential_match = covid_matching. With the potential_match bubble recognises that the two are likely the same, however, they are both different. It would be helpful for the user to specify which key of the two to use.
now there is a key_use
argument in make_cubble()
, accepting a string of either "spatial" or "temporal" (default to "temporal"), for specifying the key level to use in potential matching. See the two make_cubble()
examples at the end of the reprex:
library(cubble)
#>
#> Attaching package: 'cubble'
#> The following object is masked from 'package:stats':
#>
#> filter
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
library(strayr)
library(sf)
#> Linking to GEOS 3.11.0, GDAL 3.5.3, PROJ 9.1.0; sf_use_s2() is TRUE
covid <- readr::read_csv("https://raw.githubusercontent.com/numbats/eda/master/data/melb_lga_covid.csv") |>
mutate(Buloke = as.numeric(ifelse(Buloke == "null", "0", Buloke))) |>
mutate(Hindmarsh = as.numeric(ifelse(Hindmarsh == "null", "0", Hindmarsh))) |>
mutate(Towong = as.numeric(ifelse(Towong == "null", "0", Towong))) |>
tidyr::pivot_longer(cols = Alpine:Yarriambiack, names_to="NAME", values_to="cases") |>
mutate(Date = lubridate::ydm(paste0("2020/",Date))) |>
mutate(cases= tidyr::replace_na(cases, 0))
#> Rows: 112 Columns: 80
#> ── Column specification ────────────────────────────────────────────────────────
#> Delimiter: ","
#> chr (4): Date, Buloke, Hindmarsh, Towong
#> dbl (76): Alpine, Ararat, Ballarat, Banyule, Bass Coast, Baw Baw, Bayside, B...
#>
#> ℹ Use `spec()` to retrieve the full column specification for this data.
#> ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
covid <- covid |>
group_by(NAME) |>
mutate(new_cases = cases - dplyr::lag(cases)) |>
na.omit()
lga <- strayr::read_absmap("lga2018") |>
rename(lga = lga_name_2018) |>
dplyr::filter(state_name_2016 == "Victoria")
covid <- covid |>
select(-cases) |>
rename(lga = NAME, date=Date, cases = new_cases)
covid_ts <- tsibble::as_tsibble(covid, key=lga, index=date)
covid_matching <- check_key(spatial = lga, temporal = covid_ts)
lga <- lga |>
mutate(lga = ifelse(lga == "Colac-Otway (S)", "Colac Otway (S)", lga)) |>
filter(!(lga %in% covid_matching$others$spatial))
covid_matching <- check_key(spatial = lga, temporal = covid_ts)
make_cubble(
spatial = lga, temporal = covid_ts,potential_match = covid_matching) |>
dplyr::pull(lga) |> head()
#> Warning: st_centroid assumes attributes are constant over geometries
#> [1] "Alpine" "Ararat" "Ballarat" "Banyule" "Bass Coast"
#> [6] "Baw Baw"
make_cubble(
spatial = lga, temporal = covid_ts,
potential_match = covid_matching, key_use = "spatial") |>
dplyr::pull(lga) |> head()
#> Warning: st_centroid assumes attributes are constant over geometries
#> [1] "Alpine (S)" "Ararat (RC)" "Ballarat (C)" "Banyule (C)"
#> [5] "Bass Coast (S)" "Baw Baw (S)"
Created on 2023-10-11 with reprex v2.0.2
It uses the key from the spatial data, but the temporal one would be better.