Closed prmr closed 10 years ago
I've been working on automatically detecting a numerical attribute's direction (whether more is better or less is better).
I first tried correlating a product's price with that attribute's value, but that didn't work very well. I tried correlating attributes with the product's overall score, and this seems to work a bit better. It identifies weight as negative correlation, and most attributes have positive correlation. But there are issues: e.g. storage capacity has negative correlation when in fact more is better.
One thing I noticed is that attributes which are strongly correlated with the product's overall score are typically the most important attributes. For example, in humidifiers, the "output" has 0.95 correlation with the score, and it's probably one of the main things people care about. In laptops, performance, display, and battery (rated 1-5) are all highly correlated, whereas tech support length has practically zero correlation with the overall score. I think we should use the attributes' correlation with the overall score to identify the "default" weighting of the most important attributes.
This is an extremely useful finding. This issue should be coordinated with #45 and #47. We just decided to move the AttributeStat functionality into the logic component.
I just committed the ranking algorithm to branch issue47. For now the algorithm assumes default weights for all features. It ranks first products that satisfy more than one feature taking weights in consideration.
A design question, do we want to have a grouping interface. Meaning grouping products that satisfy some of the user selected features e.g., (feature A and B, but not C), (feature A and C but not B),etc..? If that's the case, I will modify the algorithm to return multiple ranked lists.
I merged your branch into the Issue0045 branch, and resolved two little conflicts. You can either merge Issue45
back into your branch to make use of the new ScoredAttribute
and develop more or if your code Issue0045
into master.
I merged this to the master this morning.
Here's what I think is the cleanest way to integrate my correlation code into the ranking algorithm:
I will create a class, AttributeCorrelator, whose constructor takes a Category. It will expose the methods:
Note that for now, I will only consider numerical attributes. It might be possible to do something with other attribute types as well, but that will require some experimentation first.
The AttributeCorrelator is now merged with master.
The ranking algorithm is merged with master
Implement the algorithm that takes as input as list of feature and a category and outputs a ranked list of products. Autodetect attributetype: more is better, less is better, discrete.