tidyverts / fable

Tidy time series forecasting
https://fable.tidyverts.org
GNU General Public License v3.0
560 stars 66 forks source link

Running model(ETS....) function on version 0.3.1 of Fable causes R console to hang on running and become unresponsive. #338

Open ColossalChaz opened 3 years ago

ColossalChaz commented 3 years ago

See attached zip file for reproducible example.

Yesterday I updated my version of Fable to 0.3.1 and shortly after realised that a forecasting script I had been working on had stopped working. I tried the script on both my company laptop (running R v4.0.4) and my personal PC (running R v4.0.5), in both instances I ran into the same problem where running the following line:

cancer_type_rate_fit <- cancer_type_rate_tsib %>% 
  model(
    es = ETS(rate ~ error("A") + trend("A"))
  )

caused the R console in RStudio to hang with the small stop sign in the top right. RStudio hung in this state for at least 30 minutes yesterday and remained unresponsive during this time.

Rolling back Fable to version 0.3.0 fixed the issue, so presumably it is a problem with the new release. Test.zip

mitchelloharawild commented 3 years ago

I am unable to reproduce this issue with the latest version of fable. This is likely an issue with your particular installation.

library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
library(tidyr)
library(ggplot2)
library(hts)
#> Loading required package: forecast
#> Registered S3 method overwritten by 'quantmod':
#>   method            from
#>   as.zoo.data.frame zoo
library(fable)
#> Loading required package: fabletools
#> 
#> Attaching package: 'fabletools'
#> The following objects are masked from 'package:forecast':
#> 
#>     accuracy, forecast
library(lubridate)
#> 
#> Attaching package: 'lubridate'
#> The following objects are masked from 'package:base':
#> 
#>     date, intersect, setdiff, union

#set wd to the path of TEST
ungrouped_cancer_df <- read.csv("Data/Formatted Norfolk Cancer Data.csv", check.names = F)
norfolk_pop_df <- read.csv("Data/Norfolk Population.csv", check.names = F)
all_other_cancer_df <- read.csv("Data/2018 England All Other Cancers.csv", check.names = F)

tumour_site_lookup <- read.csv("Data/Tumour Site Lookup.csv",fileEncoding="UTF-8-BOM",check.names=FALSE) %>% 
  select(TumourSite,ICD10,`Short Name`) %>% 
  rename(`Cancer Site` = `Short Name`)

#Filter out genders, filtering age group as "All" and setting years to dates).
all_persons_df <- ungrouped_cancer_df %>% 
  filter(Gender == "Persons" & AgeGroup == "All") %>% 
  mutate(Year = year(paste0(Year,"-01-01")))

#Creating a DF filtering out all Malignant
cancer_type_rate_df <- all_persons_df %>% 
  filter(`Short Name` != "All Malignant") %>% 
  select(Year, `Short Name`, HealthGeographyCode, Population, count_touse) %>% 
  group_by(Year, `Short Name`) %>% 
  summarise(Count = sum(count_touse),
            Population = sum(Population)) %>% 
  mutate(rate = (Count * 100000) / Population) %>% 
  filter(Year != 2001)
#> `summarise()` has grouped output by 'Year'. You can override using the `.groups` argument.

#Creating a TSibble (time series) in order to forecast, key set to Short Name
#to separate forecasts for these.
cancer_type_rate_tsib <- as_tsibble(cancer_type_rate_df, index = Year, key = `Short Name`)

################################

###On Fable v3.0.1 this function causes R console to hang on running and become unresponsive. 

#Fitting an exponential smoothing model with additive error and trend
cancer_type_rate_fit <- cancer_type_rate_tsib %>% 
  model(
    es = ETS(rate ~ error("A") + trend("A"))
  )

cancer_type_rate_fit
#> # A mable: 19 x 2
#> # Key:     Short Name [19]
#>    `Short Name`                  es
#>    <chr>                    <model>
#>  1 All Other Malignant <ETS(A,A,N)>
#>  2 Bladder             <ETS(A,A,N)>
#>  3 Brain               <ETS(A,A,N)>
#>  4 Breast              <ETS(A,A,N)>
#>  5 Cervix Uteri        <ETS(A,A,N)>
#>  6 Colorectal          <ETS(A,A,N)>
#>  7 Gallbladder         <ETS(A,A,N)>
#>  8 Kidney              <ETS(A,A,N)>
#>  9 Leukaemia           <ETS(A,A,N)>
#> 10 Liver               <ETS(A,A,N)>
#> 11 Lung                <ETS(A,A,N)>
#> 12 Myeloma             <ETS(A,A,N)>
#> 13 Oesophagus          <ETS(A,A,N)>
#> 14 Ovary               <ETS(A,A,N)>
#> 15 Pancreas            <ETS(A,A,N)>
#> 16 Prostate            <ETS(A,A,N)>
#> 17 Skin                <ETS(A,A,N)>
#> 18 Stomach             <ETS(A,A,N)>
#> 19 Uterus              <ETS(A,A,N)>

Created on 2021-08-26 by the reprex package (v2.0.0)

Session info ``` r sessioninfo::session_info() #> ─ Session info ─────────────────────────────────────────────────────────────── #> setting value #> version R version 4.0.2 (2020-06-22) #> os Ubuntu 20.04.1 LTS #> system x86_64, linux-gnu #> ui X11 #> language en_AU:en #> collate en_AU.UTF-8 #> ctype en_AU.UTF-8 #> tz Australia/Melbourne #> date 2021-08-26 #> #> ─ Packages ─────────────────────────────────────────────────────────────────── #> package * version date lib source #> anytime 0.3.9 2020-08-27 [1] CRAN (R 4.0.2) #> assertthat 0.2.1 2019-03-21 [1] CRAN (R 4.0.2) #> cli 3.0.1 2021-07-17 [1] CRAN (R 4.0.2) #> colorspace 2.0-2 2021-06-24 [1] CRAN (R 4.0.2) #> crayon 1.4.1 2021-02-08 [1] CRAN (R 4.0.2) #> curl 4.3.2 2021-06-23 [1] CRAN (R 4.0.2) #> DBI 1.1.1 2021-01-15 [1] CRAN (R 4.0.2) #> digest 0.6.27 2020-10-24 [1] CRAN (R 4.0.2) #> distributional 0.2.2 2021-07-26 [1] local #> dplyr * 1.0.7 2021-06-18 [1] CRAN (R 4.0.2) #> ellipsis 0.3.2 2021-04-29 [1] CRAN (R 4.0.2) #> evaluate 0.14 2019-05-28 [1] CRAN (R 4.0.2) #> fable * 0.3.1.9000 2021-08-26 [1] local #> fabletools * 0.3.1.9000 2021-08-01 [1] local #> fansi 0.5.0 2021-05-25 [1] CRAN (R 4.0.2) #> farver 2.1.0 2021-02-28 [1] CRAN (R 4.0.2) #> forecast * 8.15 2021-06-01 [1] CRAN (R 4.0.2) #> fracdiff 1.5-1 2020-01-24 [1] CRAN (R 4.0.2) #> fs 1.5.0 2020-07-31 [1] CRAN (R 4.0.2) #> generics 0.1.0 2020-10-31 [1] CRAN (R 4.0.2) #> ggplot2 * 3.3.5.9000 2021-07-26 [1] Github (tidyverse/ggplot2@13c0730) #> glue 1.4.2 2020-08-27 [1] CRAN (R 4.0.2) #> gtable 0.3.0 2019-03-25 [1] CRAN (R 4.0.2) #> highr 0.9 2021-04-16 [1] CRAN (R 4.0.2) #> htmltools 0.5.1.1 2021-01-22 [1] CRAN (R 4.0.2) #> hts * 6.0.2 2021-05-30 [1] CRAN (R 4.0.2) #> knitr 1.33 2021-04-24 [1] CRAN (R 4.0.2) #> lattice 0.20-41 2020-04-02 [2] CRAN (R 4.0.2) #> lifecycle 1.0.0 2021-02-15 [1] CRAN (R 4.0.2) #> lmtest 0.9-38 2020-09-09 [1] CRAN (R 4.0.2) #> lubridate * 1.7.10 2021-02-26 [1] CRAN (R 4.0.2) #> magrittr 2.0.1 2020-11-17 [1] CRAN (R 4.0.2) #> Matrix 1.2-18 2019-11-27 [2] CRAN (R 4.0.2) #> munsell 0.5.0 2018-06-12 [1] CRAN (R 4.0.2) #> nlme 3.1-148 2020-05-24 [2] CRAN (R 4.0.2) #> nnet 7.3-14 2020-04-26 [2] CRAN (R 4.0.2) #> pillar 1.6.2 2021-07-29 [1] CRAN (R 4.0.2) #> pkgconfig 2.0.3 2019-09-22 [1] CRAN (R 4.0.2) #> progressr 0.8.0 2021-06-10 [1] CRAN (R 4.0.2) #> purrr 0.3.4 2020-04-17 [1] CRAN (R 4.0.2) #> quadprog 1.5-8 2019-11-20 [1] CRAN (R 4.0.2) #> quantmod 0.4.18 2020-12-09 [1] CRAN (R 4.0.2) #> R6 2.5.1 2021-08-19 [1] CRAN (R 4.0.2) #> Rcpp 1.0.7 2021-07-07 [1] CRAN (R 4.0.2) #> reprex 2.0.0 2021-04-02 [1] CRAN (R 4.0.2) #> rlang 0.4.11 2021-04-30 [1] CRAN (R 4.0.2) #> rmarkdown 2.9 2021-06-15 [1] CRAN (R 4.0.2) #> rstudioapi 0.13 2020-11-12 [1] CRAN (R 4.0.2) #> scales 1.1.1 2020-05-11 [1] CRAN (R 4.0.2) #> sessioninfo 1.1.1 2018-11-05 [1] CRAN (R 4.0.2) #> SparseM 1.81 2021-02-18 [1] CRAN (R 4.0.2) #> stringi 1.7.3 2021-07-16 [1] CRAN (R 4.0.2) #> stringr 1.4.0 2019-02-10 [1] CRAN (R 4.0.2) #> tibble 3.1.4 2021-08-25 [1] CRAN (R 4.0.2) #> tidyr * 1.1.3 2021-03-03 [1] CRAN (R 4.0.2) #> tidyselect 1.1.1 2021-04-30 [1] CRAN (R 4.0.2) #> timeDate 3043.102 2018-02-21 [1] CRAN (R 4.0.2) #> tseries 0.10-48 2020-12-04 [1] CRAN (R 4.0.2) #> tsibble 1.0.1 2021-04-12 [1] CRAN (R 4.0.2) #> TTR 0.24.2 2020-09-01 [1] CRAN (R 4.0.2) #> urca 1.3-0 2016-09-06 [1] CRAN (R 4.0.2) #> utf8 1.2.2 2021-07-24 [1] CRAN (R 4.0.2) #> vctrs 0.3.8 2021-04-29 [1] CRAN (R 4.0.2) #> withr 2.4.2 2021-04-18 [1] CRAN (R 4.0.2) #> xfun 0.24 2021-06-15 [1] CRAN (R 4.0.2) #> xts 0.12.1 2020-09-09 [1] CRAN (R 4.0.2) #> yaml 2.2.1 2020-02-01 [1] CRAN (R 4.0.2) #> zoo 1.8-9 2021-03-09 [1] CRAN (R 4.0.2) #> #> [1] /home/mitchell/R/x86_64-pc-linux-gnu-library/4.0 #> [2] /opt/R/4.0.0/lib/R/library ```
ColossalChaz commented 3 years ago

Would it make a difference that you're running R version 4.0.2 whereas I ran into the issue on R 4.0.4? I should also note that I was running this through RStudio on Windows 10.

mitchelloharawild commented 3 years ago

Still unable to reproduce on R-devel. Perhaps it's Windows specific?

library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
library(tidyr)
library(ggplot2)
library(hts)
#> Loading required package: forecast
#> Registered S3 method overwritten by 'quantmod':
#>   method            from
#>   as.zoo.data.frame zoo
library(fable)
#> Loading required package: fabletools
#> 
#> Attaching package: 'fabletools'
#> The following objects are masked from 'package:forecast':
#> 
#>     accuracy, forecast
library(lubridate)
#> 
#> Attaching package: 'lubridate'
#> The following objects are masked from 'package:base':
#> 
#>     date, intersect, setdiff, union

#set wd to the path of TEST
ungrouped_cancer_df <- read.csv("Data/Formatted Norfolk Cancer Data.csv", check.names = F)
norfolk_pop_df <- read.csv("Data/Norfolk Population.csv", check.names = F)
all_other_cancer_df <- read.csv("Data/2018 England All Other Cancers.csv", check.names = F)

tumour_site_lookup <- read.csv("Data/Tumour Site Lookup.csv",fileEncoding="UTF-8-BOM",check.names=FALSE) %>% 
  select(TumourSite,ICD10,`Short Name`) %>% 
  rename(`Cancer Site` = `Short Name`)

#Filter out genders, filtering age group as "All" and setting years to dates).
all_persons_df <- ungrouped_cancer_df %>% 
  filter(Gender == "Persons" & AgeGroup == "All") %>% 
  mutate(Year = year(paste0(Year,"-01-01")))

#Creating a DF filtering out all Malignant
cancer_type_rate_df <- all_persons_df %>% 
  filter(`Short Name` != "All Malignant") %>% 
  select(Year, `Short Name`, HealthGeographyCode, Population, count_touse) %>% 
  group_by(Year, `Short Name`) %>% 
  summarise(Count = sum(count_touse),
            Population = sum(Population)) %>% 
  mutate(rate = (Count * 100000) / Population) %>% 
  filter(Year != 2001)
#> `summarise()` has grouped output by 'Year'. You can override using the `.groups` argument.

#Creating a TSibble (time series) in order to forecast, key set to Short Name
#to separate forecasts for these.
cancer_type_rate_tsib <- as_tsibble(cancer_type_rate_df, index = Year, key = `Short Name`)

################################

###On Fable v3.0.1 this function causes R console to hang on running and become unresponsive. 

#Fitting an exponential smoothing model with additive error and trend
cancer_type_rate_fit <- cancer_type_rate_tsib %>% 
  model(
    es = ETS(rate ~ error("A") + trend("A"))
  )
cancer_type_rate_fit
#> # A mable: 19 x 2
#> # Key:     Short Name [19]
#>    `Short Name`                  es
#>    <chr>                    <model>
#>  1 All Other Malignant <ETS(A,A,N)>
#>  2 Bladder             <ETS(A,A,N)>
#>  3 Brain               <ETS(A,A,N)>
#>  4 Breast              <ETS(A,A,N)>
#>  5 Cervix Uteri        <ETS(A,A,N)>
#>  6 Colorectal          <ETS(A,A,N)>
#>  7 Gallbladder         <ETS(A,A,N)>
#>  8 Kidney              <ETS(A,A,N)>
#>  9 Leukaemia           <ETS(A,A,N)>
#> 10 Liver               <ETS(A,A,N)>
#> 11 Lung                <ETS(A,A,N)>
#> 12 Myeloma             <ETS(A,A,N)>
#> 13 Oesophagus          <ETS(A,A,N)>
#> 14 Ovary               <ETS(A,A,N)>
#> 15 Pancreas            <ETS(A,A,N)>
#> 16 Prostate            <ETS(A,A,N)>
#> 17 Skin                <ETS(A,A,N)>
#> 18 Stomach             <ETS(A,A,N)>
#> 19 Uterus              <ETS(A,A,N)>

Created on 2021-08-26 by the reprex package (v2.0.1)

Session info ``` r sessioninfo::session_info() #> ─ Session info ─────────────────────────────────────────────────────────────── #> setting value #> version R version 4.1.1 (2021-08-10) #> os Ubuntu 20.04.2 LTS #> system x86_64, linux-gnu #> ui X11 #> language (EN) #> collate en_US.UTF-8 #> ctype en_US.UTF-8 #> tz Etc/UTC #> date 2021-08-26 #> #> ─ Packages ─────────────────────────────────────────────────────────────────── #> package * version date lib source #> anytime 0.3.9 2020-08-27 [1] RSPM (R 4.1.0) #> assertthat 0.2.1 2019-03-21 [1] RSPM (R 4.1.0) #> cli 3.0.1 2021-07-17 [1] RSPM (R 4.1.0) #> colorspace 2.0-2 2021-06-24 [1] RSPM (R 4.1.0) #> crayon 1.4.1 2021-02-08 [1] RSPM (R 4.1.0) #> curl 4.3.2 2021-06-23 [1] RSPM (R 4.1.0) #> DBI 1.1.1 2021-01-15 [1] RSPM (R 4.1.0) #> digest 0.6.27 2020-10-24 [1] RSPM (R 4.1.0) #> distributional 0.2.2 2021-02-02 [1] RSPM (R 4.1.0) #> dplyr * 1.0.7 2021-06-18 [1] RSPM (R 4.1.0) #> ellipsis 0.3.2 2021-04-29 [1] RSPM (R 4.1.0) #> evaluate 0.14 2019-05-28 [1] RSPM (R 4.1.0) #> fable * 0.3.1 2021-05-16 [1] RSPM (R 4.1.0) #> fabletools * 0.3.1 2021-03-16 [1] RSPM (R 4.1.0) #> fansi 0.5.0 2021-05-25 [1] RSPM (R 4.1.0) #> farver 2.1.0 2021-02-28 [1] RSPM (R 4.1.0) #> forecast * 8.15 2021-06-01 [1] RSPM (R 4.1.0) #> fracdiff 1.5-1 2020-01-24 [1] RSPM (R 4.1.0) #> fs 1.5.0 2020-07-31 [1] RSPM (R 4.1.0) #> generics 0.1.0 2020-10-31 [1] RSPM (R 4.1.0) #> ggplot2 * 3.3.5 2021-06-25 [1] RSPM (R 4.1.0) #> glue 1.4.2 2020-08-27 [1] RSPM (R 4.1.0) #> gtable 0.3.0 2019-03-25 [1] RSPM (R 4.1.0) #> highr 0.9 2021-04-16 [1] RSPM (R 4.1.0) #> htmltools 0.5.1.1 2021-01-22 [1] RSPM (R 4.1.0) #> hts * 6.0.2 2021-05-30 [1] RSPM (R 4.1.0) #> knitr 1.33 2021-04-24 [1] RSPM (R 4.1.0) #> lattice 0.20-44 2021-05-02 [2] CRAN (R 4.1.1) #> lifecycle 1.0.0 2021-02-15 [1] RSPM (R 4.1.0) #> lmtest 0.9-38 2020-09-09 [1] RSPM (R 4.1.0) #> lubridate * 1.7.10 2021-02-26 [1] RSPM (R 4.1.0) #> magrittr 2.0.1 2020-11-17 [1] RSPM (R 4.1.0) #> Matrix 1.3-4 2021-06-01 [2] CRAN (R 4.1.1) #> munsell 0.5.0 2018-06-12 [1] RSPM (R 4.1.0) #> nlme 3.1-152 2021-02-04 [2] CRAN (R 4.1.1) #> nnet 7.3-16 2021-05-03 [2] CRAN (R 4.1.1) #> pillar 1.6.2 2021-07-29 [1] RSPM (R 4.1.0) #> pkgconfig 2.0.3 2019-09-22 [1] RSPM (R 4.1.0) #> progressr 0.8.0 2021-06-10 [1] RSPM (R 4.1.0) #> purrr 0.3.4 2020-04-17 [1] RSPM (R 4.1.0) #> quadprog 1.5-8 2019-11-20 [1] RSPM (R 4.1.0) #> quantmod 0.4.18 2020-12-09 [1] RSPM (R 4.1.0) #> R6 2.5.0 2020-10-28 [1] RSPM (R 4.1.0) #> Rcpp 1.0.7 2021-07-07 [1] RSPM (R 4.1.0) #> reprex 2.0.1 2021-08-05 [1] RSPM (R 4.1.0) #> rlang 0.4.11 2021-04-30 [1] RSPM (R 4.1.0) #> rmarkdown 2.10 2021-08-06 [1] RSPM (R 4.1.0) #> rstudioapi 0.13 2020-11-12 [1] RSPM (R 4.1.0) #> scales 1.1.1 2020-05-11 [1] RSPM (R 4.1.0) #> sessioninfo 1.1.1 2018-11-05 [1] RSPM (R 4.1.0) #> SparseM 1.81 2021-02-18 [1] RSPM (R 4.1.0) #> stringi 1.7.3 2021-07-16 [1] RSPM (R 4.1.0) #> stringr 1.4.0 2019-02-10 [1] RSPM (R 4.1.0) #> tibble 3.1.3 2021-07-23 [1] RSPM (R 4.1.0) #> tidyr * 1.1.3 2021-03-03 [1] RSPM (R 4.1.0) #> tidyselect 1.1.1 2021-04-30 [1] RSPM (R 4.1.0) #> timeDate 3043.102 2018-02-21 [1] RSPM (R 4.1.0) #> tseries 0.10-48 2020-12-04 [1] RSPM (R 4.1.0) #> tsibble 1.0.1 2021-04-12 [1] RSPM (R 4.1.0) #> TTR 0.24.2 2020-09-01 [1] RSPM (R 4.1.0) #> urca 1.3-0 2016-09-06 [1] RSPM (R 4.1.0) #> utf8 1.2.2 2021-07-24 [1] RSPM (R 4.1.0) #> vctrs 0.3.8 2021-04-29 [1] RSPM (R 4.1.0) #> withr 2.4.2 2021-04-18 [1] RSPM (R 4.1.0) #> xfun 0.25 2021-08-06 [1] RSPM (R 4.1.0) #> xts 0.12.1 2020-09-09 [1] RSPM (R 4.1.0) #> yaml 2.2.1 2020-02-01 [1] RSPM (R 4.1.0) #> zoo 1.8-9 2021-03-09 [1] RSPM (R 4.1.0) #> #> [1] /usr/local/lib/R/site-library #> [2] /usr/local/lib/R/library ```
danielrmt commented 3 years ago

I also had R hanging and running when using the forecast function for ETS model, using fable 0.3.1, R 4.1.1, Rstudio 1.4.1717, Ubuntu Linux 21.04. ARIMA works fine. I downgraded fable to 0.3.0 and had no hanging. I am on a deadline this week, but next week I might be able to try to make something reproducible.