lyst / lightfm

A Python implementation of LightFM, a hybrid recommendation algorithm.
Apache License 2.0
4.77k stars 691 forks source link

Update to ensure user-item type handling parity in predict method #588

Closed markdouthwaite closed 3 years ago

markdouthwaite commented 3 years ago

Hey!

I've submitted this PR primarily to resolve a minor issue that (I feel) violates 'the principle of least surprise'.

The predict method accepts user_ids and item_ids parameters. For item_ids, a couple of coercion steps are used to massage the input into a ndarray of type np.int32. This includes casting item_ids of type list and tuple to ndarray as needed. In contrast, if user_ids are provided as a type other than ndarray, a TypeError is raised. In other words, if I call lfm.predict(user_ids=[0, 1, 2, 3], item_ids=[7, 6, 5, 4]) would currently throw a TypeError for the type of the user_ids parameter, but not the item_ids parameter. This PR ensures parity between the two.

I've also switched out the assertion used to check user_ids and item_ids in favour of a conditional that raises a ValueError instead. I've updated tests accordingly.

This PR depends on #587.

SimonCW commented 3 years ago

Thanks @markdouthwaite. Looking at the len(user_ids) != len(item_ids) ValueError, I wonder whether it could use more information about a common scenario that produces the error. I think often users expect lfm.predict(user_ids=[0, 1], item_ids=[7, 6, 5]) to predict items 7, 6, 5 to users 0, 1 but to achieve that one would need to call lfm.predict(user_ids=[0, 0, 0, 1, 1, 1], item_ids=[7, 6, 5, 7, 6, 5].

Now that I write it out, this should be made explicit with an example in the method Docstring, not in an error message. Do you agree?

markdouthwaite commented 3 years ago

Hey @SimonCW,

Thanks for the response.

this should be made explicit with an example in the method Docstring, not in an error message. Do you agree?

I agree, I'll add a few words to the docstring to clarify that.