microsoft / finnts

Microsoft Finance Time Series Forecasting Framework (FinnTS) is a forecasting package that utilizes cutting-edge time series forecasting and parallelization on the cloud to produce accurate forecasts for financial data.
https://microsoft.github.io/finnts
Other
184 stars 34 forks source link

add embeddings for all data models #11

Open mitokic opened 3 years ago

mitokic commented 3 years ago

capture relationships between categorical data like time series ID and other groupings. Helpful in deep learning models, not sure if helpful in standard multivariate ML models.

Create embeddings from deep learning models. Then use those values for categorical variables instead of using one-hot encoding/dummy variables for categorical data. Woohoo!

Could also create a separate recipe to do this or do some initial testing on the dataset and see if we should switch over to it as default. Maybe make a global option to either use dummy variables or embeddings for categorical data.

image

Excerpt from fast.ai deep learning for coders book.

Already looks like an easy integration into a recipe. https://embed.tidymodels.org/reference/step_embed.html

Lastly, we need to determine that size of our embedding. There is no steadfast rule on how to do this but a good heuristic given by Jermey Howard of Fast.Ai is to take half the number of unique values then add one.

mitokic commented 3 years ago

https://insidebigdata.com/2021/03/07/video-highlights-deep-learning-for-probabilistic-time-series-forecasting/

link to video.

Also made a good point around not using dummy variable for holiday, but instead a countdown of periods until holiday and periods after holiday. Could do the same for other categorical regressors.