Wrong number of embeddings when using Save/load on TwoTower

budbuddy commented 5 months ago

Hello,

I ran into a bit of a strange problem and I'm not sure if I'm using save/load wrong or if I missed a parameter in the load function. Here's my issue and how to recreate it:

I first trained a TwoTower model and saved it into an adhoc two_tower folder using the following command: data_info.save(path=model_path, model_name=model_name)

model.save( path=model_path, model_name=model_name, manual=True, inference_only=True )

I load this model and calculate predictions with model.predict(user_list, item_list) with no problems.

I then wanted to play around with retraining the model on updated data, so in the same folder without deleting any of the old files I retrained a new model but this time with the parameter inference_only=False like so:

data_info.save(path=model_path, model_name=model_name)

model.save( path=model_path, model_name=model_name, manual=True, inference_only=False )

Now when I try to load the model and run model.predict(user_list, item_list) I get an error: IndexError: index 59979 is out of bounds for axis 0 with size 59966 I check and realize that while my len(data_info.item2id) is 60152, my len(model.item_embeds_np) is only 59966. Weird.

After going through my files I noticed that twotower.npz hadn't been updated since my first training. After looking through the code I saw that indeed, if you set inference_only=False that file doesn't get updated. But I'm guessing when I loaded up the model it pulled embeddings from that file while data_info was up-to-date with the latest data after the second training I did, causing a mismatch.

My question is: did I load wrong? Did I save wrong? Is this an oversight?

I can work around this issue by saving my model twice each time so it's not a big issue at all.

budbuddy commented 5 months ago

After reading this issue: https://github.com/massquantity/LibRecommender/issues/455

I'm thinking maybe I should do what you suggested there and try model.set_embeddings()

I'll try this tomorrow but leave the issue open, since there isn't any mention of this in the documentation or in the examples so it's not very intuitive to me.

massquantity commented 5 months ago

Yes, you've done something wrong:). I knew this api design could be misleading, but haven't figured out a better way. Save/Load Model illustrates the difference between save/load and retrain.

save with inference_only=True will save the final user and item embeddings(user_embeds_np and item_embeds_np) for inference, and load will load these embeddings.

save with inference_only=False is used before retraining, so it won't save these inference embeddings. That's why when you load the model, you are loading the old embeddings, so the size mismatch. The correct way is constructing a new model and using rebuild_model to load the model, and this is a deliberate choice to avoid confusing users about load and retrain.

In your case, you should also set inference_only=True after retraining, so that the load api can load the correct embeddings.

massquantity / LibRecommender

Wrong number of embeddings when using Save/load on TwoTower #487