recommenders-team / recommenders

Best Practices on Recommendation Systems
https://recommenders-team.github.io/recommenders/intro.html
MIT License
18.81k stars 3.07k forks source link

[BUG] sar_movielens.ipynb - top_k = model.recommend_k_items(test, top_k=TOP_K, remove_seen=True) error #2015

Closed lordaouy closed 4 months ago

lordaouy commented 11 months ago

Description

I got error when running sar_movielens.ipynb notebooks from azure machine learning compute instance

image
2023-10-09 20:08:29,498 INFO     Calculating recommendation scores
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[8], line 2
      1 with Timer() as test_time:
----> 2     top_k = model.recommend_k_items(test, top_k=TOP_K, remove_seen=True)
      4 print("Took {} seconds for prediction.".format(test_time.interval))

File /anaconda/envs/msftrecsys2/lib/python3.9/site-packages/recommenders/models/sar/sar_singlenode.py:533, in SARSingleNode.recommend_k_items(self, test, top_k, sort_top_k, remove_seen)
    520 def recommend_k_items(self, test, top_k=10, sort_top_k=True, remove_seen=False):
    521     """Recommend top K items for all users which are in the test set
    522 
    523     Args:
   (...)
    530         pandas.DataFrame: top k recommendation items for each user
    531     """
--> 533     test_scores = self.score(test, remove_seen=remove_seen)
    535     top_items, top_scores = get_top_k_scored_items(
    536         scores=test_scores, top_k=top_k, sort_top_k=sort_top_k
    537     )
    539     df = pd.DataFrame(
    540         {
    541             self.col_user: np.repeat(
   (...)
    546         }
    547     )

File /anaconda/envs/msftrecsys2/lib/python3.9/site-packages/recommenders/models/sar/sar_singlenode.py:346, in SARSingleNode.score(self, test, remove_seen)
    344 # calculate raw scores with a matrix multiplication
    345 logger.info("Calculating recommendation scores")
--> 346 test_scores = self.user_affinity[user_ids, :].dot(self.item_similarity)
    348 # ensure we're working with a dense ndarray
    349 if isinstance(test_scores, sparse.spmatrix):

File /anaconda/envs/msftrecsys2/lib/python3.9/site-packages/scipy/sparse/_base.py:411, in _spbase.dot(self, other)
    409     return self * other
    410 else:
--> 411     return self @ other

File /anaconda/envs/msftrecsys2/lib/python3.9/site-packages/scipy/sparse/_base.py:622, in _spbase.__matmul__(self, other)
    620 def __matmul__(self, other):
    621     if isscalarlike(other):
--> 622         raise ValueError("Scalar operands are not allowed, "
    623                          "use '*' instead")
    624     return self._mul_dispatch(other)

ValueError: Scalar operands are not allowed, use '*' instead

In which platform does it happen?

It happen on Azure ML compute instance

How do we replicate the issue?

This is setup script I use in AzureML :

# 1. Install gcc if it is not installed already. On Ubuntu, this could done by using the command
# sudo apt install gcc

# 2. Create and activate a new conda environment
conda create -n msftrecsys2 python=3.9.16
conda activate msftrecsys2

# 3. Install the core recommenders package. It can run all the CPU notebooks.
pip install recommenders
pip install recommenders[gpu]
# 4. create a Jupyter kernel
python -m ipykernel install --user --name msftrecsys2 --display-name msftrecsys2

# 5. Clone this repo within VSCode or using command line:
git clone https://github.com/recommenders-team/recommenders.git

this is my compute instance setting:

image

Expected behavior (i.e. solution)

Cell below should run without throwing error

with Timer() as test_time:
    top_k = model.recommend_k_items(test, top_k=TOP_K, remove_seen=True)

print("Took {} seconds for prediction.".format(test_time.interval))

Other Comments

Petkomat commented 11 months ago

Faced the same problem. I think it's scipy related and that return np.array(result) lines at the end of similarity functions in recommenders/utils/python_utils.py should be changed to return result.toarray().

For example, the new version of jaccard should read as

def jaccard(cooccurrence):
    """<docstring>"""

    diag_rows, diag_cols = _get_row_and_column_matrix(cooccurrence.diagonal())

    with np.errstate(invalid="ignore", divide="ignore"):
        result = cooccurrence / (diag_rows + diag_cols - cooccurrence)

    # return np.array(result) # not this
    return result.toarray()
moflotas commented 9 months ago

@lordaouy I am not sure, but it might be problem with scipy itself I solved it lowering the version from the latest to scipy==1.10.1 and it worked perfectly

SimonYansenZhao commented 6 months ago

See https://github.com/scipy/scipy/issues/18796 and https://github.com/recommenders-team/recommenders/issues/1954

SimonYansenZhao commented 4 months ago

Resolved in PR #2083