prmr / Creco

Recommendation System for Consumer Products
Apache License 2.0
6 stars 2 forks source link

Feature-sensitive ranking algorithm #47

Closed prmr closed 10 years ago

prmr commented 10 years ago

Implement the algorithm that takes as input as list of feature and a category and outputs a ranked list of products. Autodetect attributetype: more is better, less is better, discrete.

forgues commented 10 years ago

I've been working on automatically detecting a numerical attribute's direction (whether more is better or less is better).

I first tried correlating a product's price with that attribute's value, but that didn't work very well. I tried correlating attributes with the product's overall score, and this seems to work a bit better. It identifies weight as negative correlation, and most attributes have positive correlation. But there are issues: e.g. storage capacity has negative correlation when in fact more is better.

One thing I noticed is that attributes which are strongly correlated with the product's overall score are typically the most important attributes. For example, in humidifiers, the "output" has 0.95 correlation with the score, and it's probably one of the main things people care about. In laptops, performance, display, and battery (rated 1-5) are all highly correlated, whereas tech support length has practically zero correlation with the overall score. I think we should use the attributes' correlation with the overall score to identify the "default" weighting of the most important attributes.

prmr commented 10 years ago

This is an extremely useful finding. This issue should be coordinated with #45 and #47. We just decided to move the AttributeStat functionality into the logic component.

MariamN commented 10 years ago

I just committed the ranking algorithm to branch issue47. For now the algorithm assumes default weights for all features. It ranks first products that satisfy more than one feature taking weights in consideration.

A design question, do we want to have a grouping interface. Meaning grouping products that satisfy some of the user selected features e.g., (feature A and B, but not C), (feature A and C but not B),etc..? If that's the case, I will modify the algorithm to return multiple ranked lists.

asutcl commented 10 years ago

I merged your branch into the Issue0045 branch, and resolved two little conflicts. You can either merge Issue45 back into your branch to make use of the new ScoredAttribute and develop more or if your code Issue0045 into master.

asutcl commented 10 years ago

I merged this to the master this morning.

forgues commented 10 years ago

Here's what I think is the cleanest way to integrate my correlation code into the ranking algorithm:

I will create a class, AttributeCorrelator, whose constructor takes a Category. It will expose the methods:

  1. computeCorrelation(Attribute attribute): This will compute the correlation between the attribute given as a parameter, and the overall score (default), over all the products in the category. The resulting correlation score can be used to weigh the importance of each attribute.
  2. I can also include a method to get the "good direction" of an attribute (if more is better or less is better), which will basically first compute the correlation, and then check if it's positive or negative. We should have an enum in ScoredAttribute for the two possible attribute directions.

Note that for now, I will only consider numerical attributes. It might be possible to do something with other attribute types as well, but that will require some experimentation first.

forgues commented 10 years ago

The AttributeCorrelator is now merged with master.

MariamN commented 10 years ago

The ranking algorithm is merged with master