thegooglecodearchive / allforgood

Automatically exported from code.google.com/p/allforgood
0 stars 0 forks source link

Confirm whether distance boosting not working properly or is not factored into algo sufficiently #602

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
Joe, what is exactly is not working properly in the distance boosting?  You’d 
said before it is not , but I know that changing distance boosting values does 
have an effect.  (Sample queries below.)  I’m not interested in making 
changes to the algo tweaks right now, but want to know what is not working 
properly I can get it addressed later.

When you say that it does not work do you mean that it is inaccurate, or that 
the bq factors that are applied override the distance boosting?

http://li169-139.members.linode.com:8983/solr/select?q={!spatial%20lat=33.901994
9%20long=-84.4296446%20radius=20}*:*&fl=location_string 

http://li169-139.members.linode.com:8983/solr/select?q={!spatial%20lat=33.901994
9%20long=-84.4296446%20radius=20+boost=recip(dist(geo_distance),1,1000,1000)^100
000}*:*&fl=location_string

Original issue reported on code.google.com by danstryk...@gmail.com on 21 Mar 2011 at 4:31

GoogleCodeExporter commented 9 years ago
I think the issues comes into effect when you actually apply all of the factors 
we take into consideration. For example here is a full Solr query we would make 
with boosting:
http://li67-22.members.linode.com:8983/solr/select/?&sort=Relevance%20desc&fq=((
eventrangeend:[2011-03-23T00:00:00.000Z+TO+*]+AND+eventrangestart:[*+TO+2011-03-
23T23:59:59.999Z])+OR+(eventrangeend:+%221971-01-01T00:00:000Z%22+AND+eventrange
start:%221971-01-01T00:00:000Z%22))&rows=100&start=0&q={%21spatial+lat%3D35.9176
387+long%3D-78.7744522+radius%3D25.0+boost%3Drecip%28dist%28geo_distance%29%2C1%
2C1000%2C1000%29^1}%2A%3A%2A&fq=self_directed:false+AND+virtual:false+AND+micro:
false&fl=title&bq=categories:vetted^15+eventrangestart:[*+TO+NOW%2B6MONTHS]^15+e
ventrangestart:[NOW+TO+NOW%2B1MONTHS]^10+eventrangestart:[NOW+TO+*]^5+eventrange
start:[NOW-6MONTHS+TO+*]^7+eventrangeend:[*+TO+NOW%2B6MONTHS]^7+eventrangeend:[N
OW+TO+NOW%2B1MONTHS]^10+-feed_providername:meetup^2+eventduration:[1+TO+10]^10

Here is one without boosting:
http://li67-22.members.linode.com:8983/solr/select/?&sort=Relevance%20desc&fq=((
eventrangeend:[2011-03-23T00:00:00.000Z+TO+*]+AND+eventrangestart:[*+TO+2011-03-
23T23:59:59.999Z])+OR+(eventrangeend:+%221971-01-01T00:00:000Z%22+AND+eventrange
start:%221971-01-01T00:00:000Z%22))&rows=100&start=0&q={%21spatial+lat%3D35.9176
387+long%3D-78.7744522+radius%3D25.0}%2A%3A%2A&fq=self_directed:false+AND+virtua
l:false+AND+micro:false&fl=title&bq=categories:vetted^15+eventrangestart:[*+TO+N
OW%2B6MONTHS]^15+eventrangestart:[NOW+TO+NOW%2B1MONTHS]^10+eventrangestart:[NOW+
TO+*]^5+eventrangestart:[NOW-6MONTHS+TO+*]^7+eventrangeend:[*+TO+NOW%2B6MONTHS]^
7+eventrangeend:[NOW+TO+NOW%2B1MONTHS]^10+-feed_providername:meetup^2+eventdurat
ion:[1+TO+10]^10

There are no differences between the two queries.

Original comment by jwdemp...@gmail.com on 23 Mar 2011 at 6:57

GoogleCodeExporter commented 9 years ago

Original comment by danstryk...@gmail.com on 23 Mar 2011 at 7:52

GoogleCodeExporter commented 9 years ago

Original comment by danstryk...@gmail.com on 23 Mar 2011 at 7:52

GoogleCodeExporter commented 9 years ago
I tested out what would happen to the ranking of the opps if we just used the 
recommended default of sort=score desc which Tien suggests.  Using that score 
for sort instead of relevance had a massive impact on the ordering of the 
results.  Clearly the distance  boosting does have an impact.  See the attached 
spreadsheet.  It literally made the first opp we’d return currently come back 
as the last opp in the default score based sort order.  We’ll want to tweak 
the algo at some point, but clearly that will need to be done carefully.  Not 
something to be tackled for the immediate release, but take a look and let me 
know what you think.

Current param set
http://li67-22.members.linode.com:8983/solr/select/?&sort=Relevance%20desc&fq=%2
8%28eventrangeend:[2011-03-23T00:00:00.000Z+TO+*]+AND+eventrangestart:[*+TO+2011
-03-23T23:59:59.999Z]%29+OR+%28eventrangeend:+%221971-01-01T00:00:000Z%22+AND+ev
entrangestart:%221971-01-01T00:00:000Z%22%29%29&rows=500&start=0&q={!spatial+lat
%3D35.9176387+long%3D-78.7744522+radius%3D25.0+boost%3Drecip%28dist%28geo_distan
ce%29%2C1%2C1000%2C1000%29^1}*%3A*&fq=self_directed:false+AND+virtual:false+AND+
micro:false&fl=id,latitude,longitude&bq=categories:vetted^15+eventrangestart:[*+
TO+NOW%2B6MONTHS]^15+eventrangestart:[NOW+TO+NOW%2B1MONTHS]^10+eventrangestart:[
NOW+TO+*]^5+eventrangestart:[NOW-6MONTHS+TO+*]^7+eventrangeend:[*+TO+NOW%2B6MONT
HS]^7+eventrangeend:[NOW+TO+NOW%2B1MONTHS]^10+-feed_providername:meetup^2+eventd
uration:[1+TO+10]^10

http://li67-22.members.linode.com:8983/solr/select/?&fq=%28%28eventrangeend:[201
1-03-23T00:00:00.000Z+TO+*]+AND+eventrangestart:[*+TO+2011-03-23T23:59:59.999Z]%
29+OR+%28eventrangeend:+%221971-01-01T00:00:000Z%22+AND+eventrangestart:%221971-
01-01T00:00:000Z%22%29%29&rows=500&start=0&q={!spatial+lat%3D35.9176387+long%3D-
78.7744522+radius%3D25.0+boost%3Drecip%28dist%28geo_distance%29%2C1%2C1000%2C100
0%29^1}*%3A*&fq=self_directed:false+AND+virtual:false+AND+micro:false&fl=id,lati
tude,longitude&bq=categories:vetted^15+eventrangestart:[*+TO+NOW%2B6MONTHS]^15+e
ventrangestart:[NOW+TO+NOW%2B1MONTHS]^10+eventrangestart:[NOW+TO+*]^5+eventrange
start:[NOW-6MONTHS+TO+*]^7+eventrangeend:[*+TO+NOW%2B6MONTHS]^7+eventrangeend:[N
OW+TO+NOW%2B1MONTHS]^10+-feed_providername:meetup^2+eventduration:[1+TO+10]^10

Dan Stryker
All for Good Director of Product Management
Cell (828) 333-1897
danstryker1@gmail.com

From: Ca Dic [mailto:cadicvnn@gmail.com] 
Sent: Saturday, March 26, 2011 1:02 AM
To: Dan Stryker
Cc: Kelvin Tan
Subject: Re: All for Good distance boosting

Hi Dan,

The two queries in Joe comment use a wrong scoring parameter sort=Relevance 
desc,
So return have no ordering, that why two query have the same order.
To sort by relavance, the correct one is sort=score desc or remove that sort 
paramter becuase solr will sort by score desc by default.

Regards,
Tien

Original comment by danstryk...@gmail.com on 27 Mar 2011 at 7:58

Attachments:

GoogleCodeExporter commented 9 years ago

Original comment by danstryk...@gmail.com on 30 Mar 2011 at 10:17