tidyverts / feasts

Feature Extraction And Statistics for Time Series
https://feasts.tidyverts.org/
296 stars 23 forks source link

Seasonal component inconsistency in X-11 method using R #160

Open ronygolderku opened 1 year ago

ronygolderku commented 1 year ago

I am learning about time series decomposition using the X-11 method in R. I am following the book “Forecasting: Principles and Practice (3rd ed)” by Rob J Hyndman and George Athanasopoulos, which uses the seasonal package to perform seasonal adjustment.

I have a question about the seasonal component obtained from the X-11 method. I applied the X-11 method to the us_retail_employment data, which is available in the fpp3 package. The code and the plot are shown below:

library(fpp3)
us_retail_employment <- us_employment |>
  filter(year(Month) >= 1990, Title == "Retail Trade") |>
  select(-Series_ID)

x11_dcmp <- us_retail_employment |>
  model(x11 = X_13ARIMA_SEATS(Employed ~ x11())) |>
  components()
autoplot(x11_dcmp) +
  labs(title =
         "Decomposition of total US retail employment using X-11.")

it generates the the following image:

enter image description here

As you can see, the seasonal component shows a decreasing trend in amplitude and peaks after January 2010, even though the original data shows an increasing trend in employment. This seems counterintuitive to me, as I would expect the seasonal component to reflect the seasonal patterns in the data.

I am wondering why this inconsistency occurs and how to interpret it. Is it due to the choice of the X-11 method over the SEATS method? Is it due to some features or assumptions of the X-11 method that affect the seasonal component estimation? Is it due to some characteristics of the data that make the seasonal component change over time?

I am also interested in learning more about the inner workings of the X-13ARIMA-SEATS function and how it calculates the seasonal component. I have read the reference manual and the R documentation, but I still find them quite complex and technical. I would appreciate it if someone could explain in simple terms how the X-13ARIMA-SEATS function works and what steps it takes to perform seasonal adjustment.

Thank you for your help and insights. @robjhyndman @mitchelloharawild

AQLT commented 11 months ago

This is not an inconsistency, and you get the same result using the SEATS decomposition. Here you are misinterpreting the different components. The decomposition algorithm decomposes your input time series into three independent components: trend, seasonal and irregular. They are linked according to the decomposition mode. In your case, the multiplicative mode is used: as you can see, Employment = Trend Seasonal Irregular. Your trend is increasing; a constant seasonal pattern over time would imply that the amplitude of seasonality (in your raw time series) would increase proportionally to the trend. This is not the case in your time series: the seasonal amplitudes appear to be constant or slightly decreasing (especially toward the end of the series). Since you are using a multiplicative decomposition (Employment = Trend Seasonal Irregular), you must have a decreasing seasonal component to reflect this evolution. For this time series, since the seasonality doesn't seem proportional to the trend, I would recommend using an additive decomposition instead of a multiplicative one.

ronygolderku commented 11 months ago

Thanks for your reply. I have also applied the additive decomposition to this data (publicly available in the fpp3 package). The thing is the same. It shows a constant decreasing pattern. The seasonal component will have some year-to-year variability, like the trend component. Could you please share with me your thoughts on which decomposition shows the year-to-year variability in the seasonal components instead of constant decreasing/increasing pattern for X11?

AQLT commented 11 months ago

I'm not sure I understand what you're looking for. The non-constant seasonal component pattern only reflects the changes over time in your time series. The decomposition mode just amplified that effect. What you are interpreting as an inconsistency is just a decrease, over the last few years, in the seasonal effect of December and February. However, if you look deeply at the results, you can also see an increase in the seasonality in April. You can also see this by analyzing the detrend data:

library(fpp3)
us_retail_employment <- us_employment |>
  filter(year(Month) >= 1990, Title == "Retail Trade") |>
  select(-Series_ID)
x11_dcmp <- us_retail_employment |>
  model(x11 = X_13ARIMA_SEATS(Employed ~ transform(`function` = "none"))) |>
  components()
x11_dcmp |> 
  mutate(Detrend = Employed - trend) |> 
  gg_season(Detrend)

You can also estimate a constant seasonal pattern over time. However, in this case you will overestimate the seasonal effect in the last years and you will have a bad decomposition (seasonality in the irregular component):

x11_dcmp_stable <- us_retail_employment |>
  model(x11 = X_13ARIMA_SEATS(Employed ~ transform(`function` = "none") + x11(mode = "add", seasonalma = "stable"))) |>
  components()
x11_dcmp_stable |> 
  autoplot(seasonal)

x11_dcmp_stable |> 
  gg_subseries(irregular)

ronygolderku commented 11 months ago

Thank you for your clear explanation. I found inspiration to incorporate the X11 decomposition (additive) method discussed in Keerthi et al. 2020 and Vantrepotte & Melin (2011) into my articles. While researching this method, I came across the X11 decomposition outlined in the fpp3 books. However, upon applying this decomposition, it consistently exhibited a constant pattern, contrary to my expectations. Interestingly, the seasonal component of their (in those articles) X11 decomposition displayed year-to-year variability, adding to my confusion. I am currently grappling with understanding the differences in the methodology employed in both the books and articles. If you have the time, I would appreciate it if you could briefly review the articles. I believe reaching a definitive conclusion about this matter is crucial.

1-s2.0-S0967063711000380-main.pdf keerthi_etal_2020.pdf

AQLT commented 11 months ago

We are now out of the scope of your first question and the package feasts. As mentioned in Vantrepotte and Melin, they use a simplified version of the X-11 algorithm and do not mention how they handle endpoints (because you cannot use symmetric filters). In feasts, you do not use X-11, but you use X-13ARIMA with X-11 decomposition: your series is first pre-treated for outliers and calendar effects with a RegARIMA model, and then decomposed with X-11 (automatic selection of the length of the filters, outliers correction and specific filters for endpoints). With X-13ARIMA you will certainly be much less biased by outliers and have fewer revisions in your estimates. For more details, you can look at the X-13ARIMA-SEATS manual: https://www2.census.gov/software/x-13arima-seats/x13as/windows/documentation/docx13as.pdf

Using the Vantrepotte and Melin, you should also see year-to-year variations in the seasonal components (and even more variability since the filter used to extract the seasonal component is shorter): if you see a constant seasonal component, they rather suggest to check your code and if you didn't implement their method wrongly.