muricoca / crab

Crab is a flexible, fast recommender engine for Python that integrates classic information filtering recommendation algorithms in the world of scientific Python packages (numpy, scipy, matplotlib).
http://muricoca.github.com/crab
Other
1.18k stars 376 forks source link

Update classes.py in knn #103

Open JoyceYuen opened 9 years ago

JoyceYuen commented 9 years ago

Hi, I like your code. It's concise and efficient. But when i read the recommenders part, that's the "class UserBasedRecommender(UserRecommender)", i found the code in the method named estimated_preference can not guarantee that one neighbor's preference will multiple the his similarity rather than others.

It is the previous code: prefs = prefs[~np.isnan(prefs)] similarities = similarities[~np.isnan(prefs)]

    prefs_sim = np.sum(prefs[~np.isnan(similarities)] *
                         similarities[~np.isnan(similarities)])
    total_similarity = np.sum(similarities)

I take a simple example:

import numpy as np p = np.array([np.nan, 3,4,5,np.nan,5,6,np.nan,9,10]) p array([ nan, 3., 4., 5., nan, 5., 6., nan, 9., 10.]) s = np.array([1,np.nan,4,6,np.nan,6,7,8,9,10]) s array([ 1., nan, 4., 6., nan, 6., 7., 8., 9., 10.]) p = p[~np.isnan(p)] p array([ 3., 4., 5., 5., 6., 9., 10.]) s = s[~np.isnan(p)] s array([ 1., nan, 4., 6., nan, 6., 7.]) p[~np.isnan(s)] array([ 3., 5., 5., 9., 10.]) s[~np.isnan(s)] array([ 1., 4., 6., 6., 7.]) p[~np.isnan(s)]*s[~np.isnan(s)] array([ 3., 20., 30., 54., 70.])

it follows the steps as the code. as you can see, it gets a wrong result.

my code is like this: temp_prefs = [~np.isnan(prefs)] temp_similarities = [~np.isnan(similarities)] noNaN_indices = np.logical_and(temp_prefs, temp_similarities)

    prefs_sim = np.sum(prefs[noNaN_indices[0] == True] *
                         similarities[noNaN_indices[0] == True])

    similarities = similarities[~np.isnan(similarities)]
    total_similarity = np.sum(similarities)

with the same example:

pp = np.array([np.nan,3,4,5,np.nan,5,6,np.nan,9,10]) pp array([ nan, 3., 4., 5., nan, 5., 6., nan, 9., 10.]) ss = np.array([1,np.nan,4,6,np.nan,6,7,8,9,10]) ss array([ 1., nan, 4., 6., nan, 6., 7., 8., 9., 10.]) tss = [~np.isnan(ss)] tss [array([ True, False, True, True, False, True, True, True, True, True], dtype=bool)] tpp = [~np.isnan(pp)] tpp [array([False, True, True, True, False, True, True, False, True, True], dtype=bool)] nonNaN = np.logical_and(tss,tpp) nonNaN array([[False, False, True, True, False, True, True, False, True, True]], dtype=bool) ss[nonNaN[0] == True] * pp[nonNaN[0] == True] array([ 16., 30., 30., 42., 81., 100.])

as you can see, it gets the right answer.

if i misunderstood, please let me know. Thank you in advance.

Best Wishes