prmr / Creco

Recommendation System for Consumer Products
Apache License 2.0
6 stars 2 forks source link

Include price in specification options #66

Closed forgues closed 10 years ago

forgues commented 10 years ago

Products have a set of attributes, which we can get using getAttributes(). However, price is a separate attribute, which we can get using getPrice(). Because it is not in the main set of attributes, the correlation and entropy algorithms do not consider it, and it is not used to rank the products.

We should either

enewe101 commented 10 years ago

Actually, I think its implemented the way you want... Currently Price is implemented as a normal attribute in the set -- so you get it in the set returned by getAttributes(); getPrice() is just meant to be a convenience method. So from what I understand it meets both points already. But post back if you find otherwise -- maybe I didn't get you right!

enewe101 commented 10 years ago

Oh, by the way, only about 25% of products have a price attribute -- I bet that's why you don't see price in the attributes sometimes. You will see it though if you look around enough!

But this in itself needs some investigation. Why do only 25% of prods have prices? I am yet to dig into the JSON to make sure it's not a parse issue.

enewe101 commented 10 years ago

It's actually quite hard to find a product where price displays! So here's a test case:

search "Duracell Coppertop" > choose "AA Battery" > observe price in attributes list

forgues commented 10 years ago

Just to be clear then, the price attribute is stored in two places for each product? First in private HashMap<String, Attribute> aAttributes = new HashMap<String, Attribute>(); and also in private Attribute aPrice = null;?

If the price is already in the attributes, then you're probably right that it's rarely in the specs because it's quite sparse in the products. I think the specs are listed according to their entropy, and that at least 80% of products in a category must have the attribute for the entropy to be considered. We could try listing the specs according to correlation, and maybe price would appear more important in that case.

enewe101 commented 10 years ago

Yep you're correct -- price is stored in both aAttributes and aPrice.

enewe101 commented 10 years ago

Ok I dug out my data exploration that contained some details on the product JSON files:

Based on the above I'm fairly confident we can rule out the possibility that there is a problem before or at data loading.

One thing that could be helpful is to produce a breakdown of price-support per category. I'll post back with that in a moment...

enewe101 commented 10 years ago

Have a look at this wiki page which shows the fraction of products that have positive prices by category.

forgues commented 10 years ago

Thanks for the data, that was extremely helpful. Unfortunately it seems like both entropy and correlation don't work well with price. Price often doesn't appear in the top attributes of a category when I sort attributes by entropy. When I sort by correlation, price is usually ranked higher than with entropy. The problem is that the correlation is often strongly positive! In many categories, more expensive products tend to have higher overall scores, which leads the algorithm to believe that having a high price is good.

Since price is such an important attribute, we might consider hardcoding its direction, i.e. LESS_IS_BETTER (and possibly hardcode its position at the top of the list of attributes, like you mentioned before).

enewe101 commented 10 years ago

That makes sense to me!

On Wed, Mar 26, 2014 at 3:24 PM, Gabriel Forgues notifications@github.comwrote:

Thanks for the data, that was extremely helpful. Unfortunately it seems like both entropy and correlation don't work well with price. Price often doesn't appear in the top attributes of a category when I sort attributes by entropy. When I sort by correlation, price is usually ranked higher than with entropy. The problem is that the correlation is often strongly positive! In many categories, more expensive products tend to have higher overall scores, which leads the algorithm to believe that having a high price is good.

Since price is such an important attribute, we might consider hardcoding its direction, i.e. LESS_IS_BETTER (and possibly hardcode its position at the top of the list of attributes, like you mentioned before).

Reply to this email directly or view it on GitHubhttps://github.com/prmr/Creco/issues/66#issuecomment-38728108 .

forgues commented 10 years ago

I hardcoded price's direction so it will always be correct. I'll close this issue and we can use issue #73 to decide if we also want to hardcode price's position at the top of the attribute list.