maciejkula / spotlight

Deep recommender models using PyTorch.
MIT License
2.97k stars 421 forks source link

Error when using Precision/Recall (ImplicitSequenceModel) #149

Closed pugantsov closed 5 years ago

pugantsov commented 5 years ago
Traceback (most recent call last):
  File "model_spotlight.py", line 536, in <module>
    sim.run(defaults=True)
  File "model_spotlight.py", line 528, in run
    evaluation = self.evaluation(model, (train, test))
  File "model_spotlight.py", line 439, in evaluation
    train_prec, train_rec = sequence_precision_recall_score(model, train)
  File "/home/alex/anaconda3/envs/recsys/lib/python3.6/site-packages/spotlight/evaluation.py", line 128, in sequence_precision_recall_score
    predictions = -model.predict(sequences[i])
  File "/home/alex/anaconda3/envs/recsys/lib/python3.6/site-packages/spotlight/sequence/implicit.py", line 318, in predict
    self._check_input(sequences)
  File "/home/alex/anaconda3/envs/recsys/lib/python3.6/site-packages/spotlight/sequence/implicit.py", line 187, in _check_input
    item_id_max = item_ids.max()
  File "/home/alex/anaconda3/envs/recsys/lib/python3.6/site-packages/numpy/core/_methods.py", line 28, in _amax
    return umr_maximum(a, axis, None, out, keepdims, initial)
ValueError: zero-size array to reduction operation maximum which has no identity

I change my Interactions matrices to SequenceInteractions instances during cross validation. I think this may be because SequenceInteractions has no item_ids member?

pugantsov commented 5 years ago

I've found where it breaks, it tries to call max on an array with no values in it:

2019-03-12 17:41:30,000 [MainThread  ] [INFO ]  Beginning model evaluation...
[[    0]
 [    1]
 [    2]
 ...
 [11436]
 [11437]
 [11438]]
[]
Traceback (most recent call last):
  File "model_spotlight.py", line 542, in <module>
    sim.run(defaults=True)
  File "model_spotlight.py", line 534, in run
    evaluation = self.evaluation(model, (train, test), (train_ids, item_ids))
  File "model_spotlight.py", line 444, in evaluation
    prec = sequence_precision_recall_score(model, test)
  File "/home/alex/anaconda3/envs/recsys/lib/python3.6/site-packages/spotlight/evaluation.py", line 128, in sequence_precision_recall_score
    predictions = -model.predict(sequences[i])
  File "/home/alex/anaconda3/envs/recsys/lib/python3.6/site-packages/spotlight/sequence/implicit.py", line 303, in predict
    self._check_input(sequences)
  File "/home/alex/anaconda3/envs/recsys/lib/python3.6/site-packages/spotlight/sequence/implicit.py", line 179, in _check_input
    item_id_max = item_ids.max()
  File "/home/alex/anaconda3/envs/recsys/lib/python3.6/site-packages/numpy/core/_methods.py", line 28, in _amax
    return umr_maximum(a, axis, None, out, keepdims, initial)
ValueError: zero-size array to reduction operation maximum which has no identity

Is there maybe something wrong with the formatting of the item_ids that there is a sequence where there are no item IDs?

EDIT: Found the issue, just before sequences is declared, printing test.sequences results in:

[[3200 7408 6290 ... 3379 2562 7778]
 [7529 4331 3728 ... 7872 3519 6102]
 [4191 1439 7711 ... 3083 8747 7610]
 ...
 [4899 6165 2563 ... 4812 6666 4481]
 [   0    0    0 ... 4361 4407 4602]
 [   0    0    0 ...  461 4331 3279]] 702

However, the line sequences = test.sequences[:, :-k] seems to result in an empty list. Realised this is because the default sequence length in to_sequences() is 10 and the default k in PR score for sequences is 10, resulting in an empty list.