Assertion error when retraining TwoTower on new data with softmax loss

budbuddy commented 1 month ago

I tried retraining a TwoTower model on new data, and get this error:

Traceback (most recent call last):
-- | -- | --
File "experiments/two_tower_retraining.py", line 192, in
main()
File "experiments/two_tower_retraining.py", line 188, in main
model, data_info = retrain_twotower(train_data, eval_data, data_info, '/data/user-events-interractions/model_twotower', 'twotower')
File "experiments/two_tower_retraining.py", line 152, in retrain_twotower
model.fit(
File "/usr/local/lib/python3.8/dist-packages/libreco/algorithms/two_tower.py", line 427, in fit
assert len(item_counts) == self.n_items
AssertionError

As you can see, this error comes from an assertion in model.fit specifically in the case where softmax is chosen as the loss metric. I've been racking my brains over this one, but can't find a fix. len(item_counts) is the number of different items in the new training set and self.n_items is the number of item embeddings in the model, so realistically these cannot be equal when retraining, unless using the entire dataset. I also thought about changing the loss but I don't get very good metrics on the other losses.

massquantity commented 1 month ago

Yeah this is a bug. self.n_items during retraining is the total item number, which is likely to be bigger than the item number in the new training set, given the new training set may contain new items.

I will release a new version to fix it. Do you have any other problems?

budbuddy commented 1 month ago

No, everything else I've tried pretty much works, although I haven't tested every model. I've been mainly using TwoTower and DIN, with some experimentation with DeepFM and YoutubeRanking.

budbuddy commented 3 weeks ago

I've checked with my model and data, and the issue has been fixed.

massquantity / LibRecommender

Assertion error when retraining TwoTower on new data with softmax loss #491