jdb78 / pytorch-forecasting

Time series forecasting with PyTorch
https://pytorch-forecasting.readthedocs.io/
MIT License
3.87k stars 611 forks source link

Predict on new unknown categories (Time Series group) #370

Closed jjacquessimeoni closed 3 years ago

jjacquessimeoni commented 3 years ago

Expected behavior

Hello! First thank you so much for releasing that library, so helpful!

Here is my issue, I am predicting the future behavior of the YouTubers in terms of viewership. Basically, I have 600k time series (1 time series = the monthly historical data of a YouTube channel). So my group_ids are unique YouTube ids like that one: UC3qOWAkHB6LYoEv-xVnnnag. I am training my model on my available YouTube channels, and every month I would have to add more than 10k new channel ids that the model wouldn't have had seen before. I have checked that a possible way to include them is to use categorical_encoders that way: categorical_encoders={"UC3qOWAkHB6LYoEv-xVnnnag": NaNLabelEncoder(add_nan=True), "UCN8v8tNOCmaZaN-t4ynDdEA": NaNLabelEncoder(add_nan=True),} right? I do see two limitations here if it works that way:

As of now, I get the unknown category error: KeyError: "Unknown category 'UC-7Un7ZJ9Z_ZOmTE_Vc0UwQ' encountered. Set add_nan=True to allow unknown categories"

Thank you for your help and sorry if you've already answered that issue somewhere.

ps: Unfortunately, I can't provide any dataset/code. Let me know if you need me to provide further details.

jdb78 commented 3 years ago

The issue with time series models is that they have to be retrained a lot. So if you have new IDs, retrain. The model cannot know what a new category means if it has not been seen ever before. Alternatively, do not use them at all and rather use metadata to describe the time series such as author etc and use the ID as a group ID only (there, new ids are allowed).

jjacquessimeoni commented 3 years ago

Sound good, thank you for your quick answer.