maciejkula / spotlight

Deep recommender models using PyTorch.
MIT License
2.98k stars 423 forks source link

Getting multiple predictions seems broken #92

Open elanmart opened 6 years ago

elanmart commented 6 years ago

I'm playing with implicit models using default BilinearNet as representation.

Given interactions test and some model model, one would expect

model.predict(test.user_ids)

will work, but it raises

RuntimeError: The expanded size of the tensor (<num_users>) must match the existing size (<...>) at non-singleton dimension 0

I think fixing this would require changing the way spotlight generates predictions. Currently when a we want predictions for user 7 and items [1, 2, 3], we actually call

model._net([7, 7, 7], [1, 2, 3])

To scale this to multiple users, e.g. users [7, 8] we could

  1. Generate a tensor [[7,7,7], [8,8,8]] for user_ids, and call _net() as usual (some unsqueeze on item_embeddings would be needed for broadcasting)
  2. Have a BilinearNet.predict_all method that would compute
    
    x = th.LongTensor([7, 8])
    y = th.LongTensor([1, 2, 3])

self.user_embeddings(x) @ self.item_embeddings(y).t()


3. Use `torch.bmm()` which, depending on the shape of the `user_ids` either computes equivalent `1.` or `2.`

I believe 2. is the cleanest and should be also the fastest.
@maciejkula what do you think?
maciejkula commented 6 years ago

I think this is basically a feature request, something that the library doesn't do at the moment (rather than a bug).

I'd be happy to have a think about doing this provided that

  1. You'd be happy to apply it consistently to all models.
  2. The result doesn't complicate the code much.

One pointer for opening issues: it may be nicer to the maintainer if you don't start by assuming something is broken when it doesn't behave exactly as you want.

elanmart commented 6 years ago

I'm sorry, I didn't intend this to sound offensive. It seemed broken, since doing something quite intuitive resulted in an obscure pytorch error, but I agree I'm at fault here.

I can try to add this feature in the near future, but for now perhaps it would be nice to add a helpful error message when user_ids is an array, but item_ids is None?

yueguoguo commented 6 years ago

Hi @maciejkula and @elanmart , would like to know if there is an update on this issue.

I am trying the same thing and got the same error.

E.g., I trained a factorization model by using the following codes:

from spotlight.cross_validation import random_train_test_split
from spotlight.datasets.movielens import get_movielens_dataset
from spotlight.evaluation import rmse_score
from spotlight.factorization.explicit import ExplicitFactorizationModel

dataset = get_movielens_dataset(variant='1M')

train, test = user_based_train_test_split(dataset, test_percentage=0.25)

model = ExplicitFactorizationModel(n_iter=1)
model.fit(train)

Then when I predict recommendation scores by using the test data with

model.predict(test.user_ids)

I got error of

RuntimeError: The expanded size of the tensor (3707) must match the existing size (258400) at non-singleton dimension 0. at /opt/conda/conda-bld/pytorch_1518241081361/work/torch/lib/TH/generic/THTensor.c:309

So I used one user id instead (e.g., the first one in the array)

model.predict(test.user_ids[0])

It worked and returned me with an array of item recommendation scores.

Another question is, if I use one user id for prediction, what are the item IDs corresponding to the scores?

Any advice will be very much appreciated! :)

Best, Le

yueguoguo commented 6 years ago

I think I get some useful information from #30

maciejkula commented 6 years ago

@yueguoguo @elanmart to document current implementation better, how does the following sound? https://github.com/maciejkula/spotlight/pull/109

yueguoguo commented 6 years ago

Much better! Thanks @maciejkula