Predict on new unknown categories (Time Series group)

jjacquessimeoni commented 3 years ago

PyTorch-Forecasting version: 0.8.3
PyTorch version: 1.7.1
Python version: 3.6.8
Operating System: Linux

Expected behavior

Hello! First thank you so much for releasing that library, so helpful!

Here is my issue, I am predicting the future behavior of the YouTubers in terms of viewership. Basically, I have 600k time series (1 time series = the monthly historical data of a YouTube channel). So my group_ids are unique YouTube ids like that one: UC3qOWAkHB6LYoEv-xVnnnag. I am training my model on my available YouTube channels, and every month I would have to add more than 10k new channel ids that the model wouldn't have had seen before. I have checked that a possible way to include them is to use categorical_encoders that way: categorical_encoders={"UC3qOWAkHB6LYoEv-xVnnnag": NaNLabelEncoder(add_nan=True), "UCN8v8tNOCmaZaN-t4ynDdEA": NaNLabelEncoder(add_nan=True),} right? I do see two limitations here if it works that way:

First, I don't necessarily know for several months what would be the upcoming channels that would be eligible to go through my prediction pipeline.
Then, even if I do know that list, I fear that might skew the performances of my model. Well, you might help me there, is the model learning on these new empty groups, or there are left aside and it is just a useful feature for us to specify that in the future the model should have the possibility to predict these groups?

Anyway, my expected behavior would be to change nothing for my training pipeline and to be able to feed these new unseen YouTube ids to my model so that I would be able to predict their future trajectories.

As of now, I get the unknown category error: KeyError: "Unknown category 'UC-7Un7ZJ9Z_ZOmTE_Vc0UwQ' encountered. Set add_nan=True to allow unknown categories"

Thank you for your help and sorry if you've already answered that issue somewhere.

ps: Unfortunately, I can't provide any dataset/code. Let me know if you need me to provide further details.

jdb78 commented 3 years ago

The issue with time series models is that they have to be retrained a lot. So if you have new IDs, retrain. The model cannot know what a new category means if it has not been seen ever before. Alternatively, do not use them at all and rather use metadata to describe the time series such as author etc and use the ID as a group ID only (there, new ids are allowed).

jjacquessimeoni commented 3 years ago

Sound good, thank you for your quick answer.

jdb78 / pytorch-forecasting

Predict on new unknown categories (Time Series group) #370

Expected behavior