tidyverts / fabletools

General fable features useful for extension packages
http://fabletools.tidyverts.org/
89 stars 31 forks source link

Mutate() issue - Hierarchical Forecasting #306

Open edoardobassett opened 3 years ago

edoardobassett commented 3 years ago

Hi, I am reposting this issue on GitHub, with a more complete example, as I suspect it might not be related to the data being used or code mistakes.

I am trying to perform Hierarchical Forecasting on a dataset that is fundamentally structured in the same way as the tourism tsibble referenced in Forecasting: Principles and Practice, but with more hierarchical levels. However, after the structural aggregation, a mutate() error shows up. The data doesn't contain any missing values.

Following, you will find a reprex of the code, containing a minimal version of the data used that is able to reproduce the error.

Thanks in advance.

library(fable)
library(dplyr)
library(tsibble)
library(tidyverse)

t_london <- tibble::tribble(
  ~Month,             ~Value.type,   ~LSOA11CD,             ~LSOA11NM,     ~WD19CD,      ~WD19NM,    ~LAD19CD,         ~LAD19NM,           ~CTYNM, ~RGN19NM, ~CNTY21NM, ~NTN21NM, ~Count,
  "2010 Dec", "Value-Type-1         ", "E01000001", "City of London 001A", "E05009288", "Aldersgate", "E09000001", "City of London", "City Of London", "London", "England",     "UK",     2L,
  "2011 Jan", "Value-Type-1         ", "E01000001", "City of London 001A", "E05009288", "Aldersgate", "E09000001", "City of London", "City Of London", "London", "England",     "UK",     2L,
  "2011 Feb", "Value-Type-1         ", "E01000001", "City of London 001A", "E05009288", "Aldersgate", "E09000001", "City of London", "City Of London", "London", "England",     "UK",     3L,
  "2011 Mar", "Value-Type-1         ", "E01000001", "City of London 001A", "E05009288", "Aldersgate", "E09000001", "City of London", "City Of London", "London", "England",     "UK",     2L,
  "2011 Apr", "Value-Type-1         ", "E01000001", "City of London 001A", "E05009288", "Aldersgate", "E09000001", "City of London", "City Of London", "London", "England",     "UK",     0L,
  "2011 May", "Value-Type-1         ", "E01000001", "City of London 001A", "E05009288", "Aldersgate", "E09000001", "City of London", "City Of London", "London", "England",     "UK",     2L,
  "2011 Jun", "Value-Type-1         ", "E01000001", "City of London 001A", "E05009288", "Aldersgate", "E09000001", "City of London", "City Of London", "London", "England",     "UK",     4L,
  "2011 Jul", "Value-Type-1         ", "E01000001", "City of London 001A", "E05009288", "Aldersgate", "E09000001", "City of London", "City Of London", "London", "England",     "UK",     3L,
  "2011 Aug", "Value-Type-1         ", "E01000001", "City of London 001A", "E05009288", "Aldersgate", "E09000001", "City of London", "City Of London", "London", "England",     "UK",     2L,
  "2011 Sep", "Value-Type-1         ", "E01000001", "City of London 001A", "E05009288", "Aldersgate", "E09000001", "City of London", "City Of London", "London", "England",     "UK",     0L,
  "2011 Oct", "Value-Type-1         ", "E01000001", "City of London 001A", "E05009288", "Aldersgate", "E09000001", "City of London", "City Of London", "London", "England",     "UK",     1L,
  "2011 Nov", "Value-Type-1         ", "E01000001", "City of London 001A", "E05009288", "Aldersgate", "E09000001", "City of London", "City Of London", "London", "England",     "UK",     1L,
  "2011 Dec", "Value-Type-1         ", "E01000001", "City of London 001A", "E05009288", "Aldersgate", "E09000001", "City of London", "City Of London", "London", "England",     "UK",     6L
)

t_london <- t_london  %>%
mutate(Month = yearmonth(Month)) %>%
  as_tsibble(key = c(LSOA11CD, Value.type), index=Month)

london_full <- t_london %>% aggregate_key((NTN21NM/ CNTY21NM / RGN19NM / CTYNM / LAD19NM / WD19NM /LSOA11NM) * Value.type, Total = sum(Count))

fit <- london_full %>%
  model(base = ARIMA(Total)) %>%
  reconcile(
    bu = bottom_up(base),
    ols = min_trace(base, method = "ols"),
    mint = min_trace(base, method = "mint_shrink"),
  )
#> Warning in max(which(abs(ma) > 1e-08)): no non-missing arguments to max;
#> returning -Inf

#> Warning: 16 errors (1 unique) encountered for base
#> [16] argument must be coercible to non-negative integer

fc <- fit %>%
  forecast(h = 5)
#> Warning: Problem with `mutate()` input `mint`.
#> ℹ diag(.) had 0 or NA entries; non-finite result is doubtful
#> ℹ Input `mint` is `(function (object, ...) ...`.
#> Warning: Problem with `mutate()` input `mint`.
#> ℹ diag(.) had 0 or NA entries; non-finite result is doubtful
#> ℹ Input `mint` is `(function (object, ...) ...`.
#> Error: Problem with `mutate()` input `mint`.
#> x infinite or missing values in 'x'
#> ℹ Input `mint` is `(function (object, ...) ...`.

Created on 2021-02-10 by the reprex package (v0.3.0)

mitchelloharawild commented 3 years ago

I am unable to reproduce this issue with the latest versions of the packages. Perhaps try updating to the latest CRAN releases?

library(fable)
#> Loading required package: fabletools
library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
library(tsibble)
#> 
#> Attaching package: 'tsibble'
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, union
library(tidyverse)

t_london <- tibble::tribble(
  ~Month,             ~Value.type,   ~LSOA11CD,             ~LSOA11NM,     ~WD19CD,      ~WD19NM,    ~LAD19CD,         ~LAD19NM,           ~CTYNM, ~RGN19NM, ~CNTY21NM, ~NTN21NM, ~Count,
  "2010 Dec", "Value-Type-1         ", "E01000001", "City of London 001A", "E05009288", "Aldersgate", "E09000001", "City of London", "City Of London", "London", "England",     "UK",     2L,
  "2011 Jan", "Value-Type-1         ", "E01000001", "City of London 001A", "E05009288", "Aldersgate", "E09000001", "City of London", "City Of London", "London", "England",     "UK",     2L,
  "2011 Feb", "Value-Type-1         ", "E01000001", "City of London 001A", "E05009288", "Aldersgate", "E09000001", "City of London", "City Of London", "London", "England",     "UK",     3L,
  "2011 Mar", "Value-Type-1         ", "E01000001", "City of London 001A", "E05009288", "Aldersgate", "E09000001", "City of London", "City Of London", "London", "England",     "UK",     2L,
  "2011 Apr", "Value-Type-1         ", "E01000001", "City of London 001A", "E05009288", "Aldersgate", "E09000001", "City of London", "City Of London", "London", "England",     "UK",     0L,
  "2011 May", "Value-Type-1         ", "E01000001", "City of London 001A", "E05009288", "Aldersgate", "E09000001", "City of London", "City Of London", "London", "England",     "UK",     2L,
  "2011 Jun", "Value-Type-1         ", "E01000001", "City of London 001A", "E05009288", "Aldersgate", "E09000001", "City of London", "City Of London", "London", "England",     "UK",     4L,
  "2011 Jul", "Value-Type-1         ", "E01000001", "City of London 001A", "E05009288", "Aldersgate", "E09000001", "City of London", "City Of London", "London", "England",     "UK",     3L,
  "2011 Aug", "Value-Type-1         ", "E01000001", "City of London 001A", "E05009288", "Aldersgate", "E09000001", "City of London", "City Of London", "London", "England",     "UK",     2L,
  "2011 Sep", "Value-Type-1         ", "E01000001", "City of London 001A", "E05009288", "Aldersgate", "E09000001", "City of London", "City Of London", "London", "England",     "UK",     0L,
  "2011 Oct", "Value-Type-1         ", "E01000001", "City of London 001A", "E05009288", "Aldersgate", "E09000001", "City of London", "City Of London", "London", "England",     "UK",     1L,
  "2011 Nov", "Value-Type-1         ", "E01000001", "City of London 001A", "E05009288", "Aldersgate", "E09000001", "City of London", "City Of London", "London", "England",     "UK",     1L,
  "2011 Dec", "Value-Type-1         ", "E01000001", "City of London 001A", "E05009288", "Aldersgate", "E09000001", "City of London", "City Of London", "London", "England",     "UK",     6L
)

t_london <- t_london  %>%
  mutate(Month = yearmonth(Month)) %>%
  as_tsibble(key = c(LSOA11CD, Value.type), index=Month)

london_full <- t_london %>% aggregate_key((NTN21NM/ CNTY21NM / RGN19NM / CTYNM / LAD19NM / WD19NM /LSOA11NM) * Value.type, Total = sum(Count))

fit <- london_full %>%
  model(base = ARIMA(Total)) %>%
  reconcile(
    bu = bottom_up(base),
    ols = min_trace(base, method = "ols"),
    mint = min_trace(base, method = "mint_shrink"),
  )

fc <- fit %>%
  forecast(h = 5)
fc
#> # A fable: 320 x 12 [1M]
#> # Key:     NTN21NM, Value.type, CNTY21NM, RGN19NM, CTYNM, LAD19NM, WD19NM,
#> #   LSOA11NM, .model [64]
#>    NTN21NM    Value.type CNTY21NM   RGN19NM    CTYNM      LAD19NM    WD19NM    
#>    <chr*>     <chr*>     <chr*>     <chr*>     <chr*>     <chr*>     <chr*>    
#>  1 UK         Value-Typ… England    London     City Of L… City of L… Aldersgate
#>  2 UK         Value-Typ… England    London     City Of L… City of L… Aldersgate
#>  3 UK         Value-Typ… England    London     City Of L… City of L… Aldersgate
#>  4 UK         Value-Typ… England    London     City Of L… City of L… Aldersgate
#>  5 UK         Value-Typ… England    London     City Of L… City of L… Aldersgate
#>  6 UK         Value-Typ… England    London     City Of L… City of L… Aldersgate
#>  7 UK         Value-Typ… England    London     City Of L… City of L… Aldersgate
#>  8 UK         Value-Typ… England    London     City Of L… City of L… Aldersgate
#>  9 UK         Value-Typ… England    London     City Of L… City of L… Aldersgate
#> 10 UK         Value-Typ… England    London     City Of L… City of L… Aldersgate
#> # … with 310 more rows, and 5 more variables: LSOA11NM <chr*>, .model <chr>,
#> #   Month <mth>, Total <dist>, .mean <dbl>

Created on 2021-02-11 by the reprex package (v0.3.0)

Session info ``` r devtools::session_info() #> ─ Session info ─────────────────────────────────────────────────────────────── #> setting value #> version R version 4.0.2 (2020-06-22) #> os Ubuntu 20.04.1 LTS #> system x86_64, linux-gnu #> ui X11 #> language en_AU:en #> collate en_AU.UTF-8 #> ctype en_AU.UTF-8 #> tz Australia/Melbourne #> date 2021-02-11 #> #> ─ Packages ─────────────────────────────────────────────────────────────────── #> package * version date lib source #> anytime 0.3.9 2020-08-27 [1] CRAN (R 4.0.2) #> assertthat 0.2.1 2019-03-21 [1] CRAN (R 4.0.2) #> backports 1.2.1 2020-12-09 [1] CRAN (R 4.0.2) #> blob 1.2.1 2020-01-20 [1] CRAN (R 4.0.2) #> broom 0.7.0 2020-07-09 [1] CRAN (R 4.0.2) #> callr 3.5.1 2020-10-13 [1] CRAN (R 4.0.2) #> cellranger 1.1.0 2016-07-27 [1] CRAN (R 4.0.2) #> cli 2.3.0 2021-01-31 [1] CRAN (R 4.0.2) #> colorspace 2.0-0 2020-11-11 [1] CRAN (R 4.0.2) #> crayon 1.4.0 2021-01-30 [1] CRAN (R 4.0.2) #> DBI 1.1.0 2019-12-15 [1] CRAN (R 4.0.2) #> dbplyr 1.4.4 2020-05-27 [1] CRAN (R 4.0.2) #> desc 1.2.0 2018-05-01 [1] CRAN (R 4.0.2) #> devtools 2.3.2 2020-09-18 [1] CRAN (R 4.0.2) #> digest 0.6.27 2020-10-24 [1] CRAN (R 4.0.2) #> distributional 0.2.1 2020-10-06 [1] CRAN (R 4.0.2) #> dplyr * 1.0.4 2021-02-02 [1] CRAN (R 4.0.2) #> ellipsis 0.3.1 2020-05-15 [1] CRAN (R 4.0.2) #> evaluate 0.14 2019-05-28 [1] CRAN (R 4.0.2) #> fable * 0.3.0 2021-02-02 [1] local #> fabletools * 0.3.0.9000 2021-02-02 [1] local #> fansi 0.4.2 2021-01-15 [1] CRAN (R 4.0.2) #> farver 2.0.3 2020-01-16 [1] CRAN (R 4.0.2) #> feasts 0.1.7 2021-02-08 [1] local #> forcats * 0.5.1 2021-01-27 [1] CRAN (R 4.0.2) #> fs 1.5.0 2020-07-31 [1] CRAN (R 4.0.2) #> generics 0.1.0 2020-10-31 [1] CRAN (R 4.0.2) #> ggplot2 * 3.3.3 2020-12-30 [1] CRAN (R 4.0.2) #> glue 1.4.2 2020-08-27 [1] CRAN (R 4.0.2) #> gtable 0.3.0 2019-03-25 [1] CRAN (R 4.0.2) #> haven 2.3.1 2020-06-01 [1] CRAN (R 4.0.2) #> highr 0.8 2019-03-20 [1] CRAN (R 4.0.2) #> hms 1.0.0 2021-01-13 [1] CRAN (R 4.0.2) #> htmltools 0.5.1 2021-01-12 [1] CRAN (R 4.0.2) #> httr 1.4.2 2020-07-20 [1] CRAN (R 4.0.2) #> jsonlite 1.7.2 2020-12-09 [1] CRAN (R 4.0.2) #> knitr 1.30 2020-09-22 [1] CRAN (R 4.0.2) #> lattice 0.20-41 2020-04-02 [2] CRAN (R 4.0.2) #> lifecycle 0.2.0 2020-03-06 [1] CRAN (R 4.0.2) #> lubridate 1.7.9.2 2020-11-13 [1] CRAN (R 4.0.2) #> magrittr 2.0.1 2020-11-17 [1] CRAN (R 4.0.2) #> Matrix 1.2-18 2019-11-27 [2] CRAN (R 4.0.2) #> memoise 1.1.0 2017-04-21 [1] CRAN (R 4.0.2) #> modelr 0.1.8 2020-05-19 [1] CRAN (R 4.0.2) #> munsell 0.5.0 2018-06-12 [1] CRAN (R 4.0.2) #> nlme 3.1-148 2020-05-24 [2] CRAN (R 4.0.2) #> pillar 1.4.7 2020-11-20 [1] CRAN (R 4.0.2) #> pkgbuild 1.2.0 2020-12-15 [1] CRAN (R 4.0.2) #> pkgconfig 2.0.3 2019-09-22 [1] CRAN (R 4.0.2) #> pkgload 1.1.0 2020-05-29 [1] CRAN (R 4.0.2) #> prettyunits 1.1.1 2020-01-24 [1] CRAN (R 4.0.2) #> processx 3.4.5 2020-11-30 [1] CRAN (R 4.0.2) #> progressr 0.7.0 2020-12-11 [1] CRAN (R 4.0.2) #> ps 1.5.0 2020-12-05 [1] CRAN (R 4.0.2) #> purrr * 0.3.4 2020-04-17 [1] CRAN (R 4.0.2) #> R6 2.5.0 2020-10-28 [1] CRAN (R 4.0.2) #> Rcpp 1.0.6 2021-01-15 [1] CRAN (R 4.0.2) #> readr * 1.4.0 2020-10-05 [1] CRAN (R 4.0.2) #> readxl 1.3.1 2019-03-13 [1] CRAN (R 4.0.2) #> remotes 2.2.0 2020-07-21 [1] CRAN (R 4.0.2) #> reprex 0.3.0 2019-05-16 [1] CRAN (R 4.0.2) #> rlang 0.4.10 2020-12-30 [1] CRAN (R 4.0.2) #> rmarkdown 2.6 2020-12-14 [1] CRAN (R 4.0.2) #> rprojroot 2.0.2 2020-11-15 [1] CRAN (R 4.0.2) #> rvest 0.3.6 2020-07-25 [1] CRAN (R 4.0.2) #> scales 1.1.1 2020-05-11 [1] CRAN (R 4.0.2) #> sessioninfo 1.1.1 2018-11-05 [1] CRAN (R 4.0.2) #> stringi 1.5.3 2020-09-09 [1] CRAN (R 4.0.2) #> stringr * 1.4.0 2019-02-10 [1] CRAN (R 4.0.2) #> testthat 3.0.1 2020-12-17 [1] CRAN (R 4.0.2) #> tibble * 3.0.6 2021-01-29 [1] CRAN (R 4.0.2) #> tidyr * 1.1.2 2020-08-27 [1] CRAN (R 4.0.2) #> tidyselect 1.1.0 2020-05-11 [1] CRAN (R 4.0.2) #> tidyverse * 1.3.0 2019-11-21 [1] CRAN (R 4.0.2) #> tsibble * 1.0.0 2021-02-05 [1] Github (tidyverts/tsibble@722cc86) #> urca 1.3-0 2016-09-06 [1] CRAN (R 4.0.2) #> usethis 1.6.3 2020-09-17 [1] CRAN (R 4.0.2) #> utf8 1.1.4 2018-05-24 [1] CRAN (R 4.0.2) #> vctrs 0.3.6 2020-12-17 [1] CRAN (R 4.0.2) #> withr 2.4.1 2021-01-26 [1] CRAN (R 4.0.2) #> xfun 0.20 2021-01-06 [1] CRAN (R 4.0.2) #> xml2 1.3.2 2020-04-23 [1] CRAN (R 4.0.2) #> yaml 2.2.1 2020-02-01 [1] CRAN (R 4.0.2) #> #> [1] /home/mitchell/R/x86_64-pc-linux-gnu-library/4.0 #> [2] /opt/R/4.0.0/lib/R/library ```
edoardobassett commented 3 years ago

Great, the issue is solved with the latest version of the packages. Thank you very much!

edoardobassett commented 3 years ago

The problem seemed to re-appear, when using the whole dataset. I tried capturing some of the rows that seem to be part of the issue, which you will find in the new reprex. All the packages used are the latest version.

library(fable)
#> Loading required package: fabletools
library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
library(tsibble)
library(tidyverse)

t_london <- tibble::tribble(
  ~Month,   ~Value.type,             ~LSOA11NM,     ~WD19CD,             ~WD19NM,         ~LAD19NM,           ~CTYNM, ~RGN19NM, ~CNTY21NM, ~NTN21NM, ~Count,
  "2016 Dec", "Value-Type2", "City of London 001A", "E05009288",        "Aldersgate", "City of London", "City Of London", "London", "England",     "UK",     0L,
  "2017 Jan", "Value-Type2", "City of London 001A", "E05009288",        "Aldersgate", "City of London", "City Of London", "London", "England",     "UK",     0L,
  "2016 Dec", "Value-Type2", "City of London 001B", "E05009302",       "Cripplegate", "City of London", "City Of London", "London", "England",     "UK",     1L,
  "2017 Jan", "Value-Type2", "City of London 001B", "E05009302",       "Cripplegate", "City of London", "City Of London", "London", "England",     "UK",     1L,
  "2016 Dec", "Value-Type2", "City of London 001C", "E05009302",       "Cripplegate", "City of London", "City Of London", "London", "England",     "UK",     0L,
  "2017 Jan", "Value-Type2", "City of London 001C", "E05009302",       "Cripplegate", "City of London", "City Of London", "London", "England",     "UK",     0L,
  "2016 Dec", "Value-Type2", "City of London 001E", "E05009308",         "Portsoken", "City of London", "City Of London", "London", "England",     "UK",     0L,
  "2017 Jan", "Value-Type2", "City of London 001E", "E05009308",         "Portsoken", "City of London", "City Of London", "London", "England",     "UK",     1L,
  "2016 Dec", "Value-Type2", "City of London 001F", "E05009311",            "Vintry", "City of London", "City Of London", "London", "England",     "UK",    54L,
  "2017 Jan", "Value-Type2", "City of London 001F", "E05009311",            "Vintry", "City of London", "City Of London", "London", "England",     "UK",    62L,
  "2016 Dec", "Value-Type2", "City of London 001G", "E05009304", "Farringdon Within", "City of London", "City Of London", "London", "England",     "UK",    12L,
  "2017 Jan", "Value-Type2", "City of London 001G", "E05009304", "Farringdon Within", "City of London", "City Of London", "London", "England",     "UK",     9L
)

t_london <- t_london  %>%
mutate(Month = yearmonth(Month)) %>%
  as_tsibble(key = c(LSOA11NM, Value.type), index=Month)

london_full <- t_london %>% aggregate_key((NTN21NM/ CNTY21NM / RGN19NM / CTYNM / LAD19NM / WD19NM /LSOA11NM) * Value.type, Total = sum(Count))

fit <- london_full %>%
  model(base = ARIMA(Total)) %>%
  reconcile(
    bu = bottom_up(base),
    ols = min_trace(base, method = "ols"),
    mint = min_trace(base, method = "mint_shrink"),
  )
#> Warning: 6 errors (1 unique) encountered for base
#> [6] missing value where TRUE/FALSE needed

fc <- fit %>%
  forecast(h = 1)
#> Warning in cov2cor(covm): diag(.) had 0 or NA entries; non-finite result is
#> doubtful
#> Warning in cov2cor(tar): diag(.) had 0 or NA entries; non-finite result is
#> doubtful
#> Error: Problem with `mutate()` input `mint`.
#> x infinite or missing values in 'x'
#> ℹ Input `mint` is `(function (object, ...) ...`.

Created on 2021-02-12 by the reprex package (v1.0.0)

wdzhy123 commented 2 years ago

get the same issue, are there any updates for this?

slava-keshkov commented 2 years ago

are there any updates on this @mitchelloharawild?

thanks in advance!

mitchelloharawild commented 2 years ago

Hi, please provide a minimal reproducible example. I've just tried reproducing the example above, and the reason why it fails is due to ARIMA models being trained on just 2 observations per series - more data is required to produce sensible output.