Closed jjacquessimeoni closed 3 years ago
The issue with time series models is that they have to be retrained a lot. So if you have new IDs, retrain. The model cannot know what a new category means if it has not been seen ever before. Alternatively, do not use them at all and rather use metadata to describe the time series such as author etc and use the ID as a group ID only (there, new ids are allowed).
Sound good, thank you for your quick answer.
Expected behavior
Hello! First thank you so much for releasing that library, so helpful!
Here is my issue, I am predicting the future behavior of the YouTubers in terms of viewership. Basically, I have 600k time series (1 time series = the monthly historical data of a YouTube channel). So my group_ids are unique YouTube ids like that one: UC3qOWAkHB6LYoEv-xVnnnag. I am training my model on my available YouTube channels, and every month I would have to add more than 10k new channel ids that the model wouldn't have had seen before. I have checked that a possible way to include them is to use categorical_encoders that way: categorical_encoders={"UC3qOWAkHB6LYoEv-xVnnnag": NaNLabelEncoder(add_nan=True), "UCN8v8tNOCmaZaN-t4ynDdEA": NaNLabelEncoder(add_nan=True),} right? I do see two limitations here if it works that way:
Then, even if I do know that list, I fear that might skew the performances of my model. Well, you might help me there, is the model learning on these new empty groups, or there are left aside and it is just a useful feature for us to specify that in the future the model should have the possibility to predict these groups?
Anyway, my expected behavior would be to change nothing for my training pipeline and to be able to feed these new unseen YouTube ids to my model so that I would be able to predict their future trajectories.
As of now, I get the unknown category error: KeyError: "Unknown category 'UC-7Un7ZJ9Z_ZOmTE_Vc0UwQ' encountered. Set
add_nan=True
to allow unknown categories"Thank you for your help and sorry if you've already answered that issue somewhere.
ps: Unfortunately, I can't provide any dataset/code. Let me know if you need me to provide further details.