hunt-framework / hunt

A flexible, lightweight search platform
59 stars 10 forks source link

Order of results depend on the search-result limit (icMaxSR) and on the offset (icOffsetSR) #87

Closed sebastian-philipp closed 10 years ago

sebastian-philipp commented 10 years ago
administrator@holumbus:~/haskell/hunt$ curl -s 'http://localhost:3000/search/map/0/300' | jq '.["msg"]["result"][]["uri"]' | nl | grep 'http://hackage.haskell.org/package/base/docs/Prelude.html#v:map'
   174  "http://hackage.haskell.org/package/base/docs/Prelude.html#v:map"
administrator@holumbus:~/haskell/hunt$ curl -s 'http://localhost:3000/search/map/0/301' | jq '.["msg"]["result"][]["uri"]' | nl | grep 'http://hackage.haskell.org/package/base/docs/Prelude.html#v:map'
    78  "http://hackage.haskell.org/package/base/docs/Prelude.html#v:map"
administrator@holumbus:~/haskell/hunt$ curl -s 'http://localhost:3000/search/map/0/302' | jq '.["msg"]["result"][]["uri"]' | nl | grep 'http://hackage.haskell.org/package/base/docs/Prelude.html#v:map'
   254  "http://hackage.haskell.org/package/base/docs/Prelude.html#v:map"
administrator@holumbus:~/haskell/hunt$ curl -s 'http://localhost:3000/search/map/0/303' | jq '.["msg"]["result"][]["uri"]' | nl | grep 'http://hackage.haskell.org/package/base/docs/Prelude.html#v:map'
   182  "http://hackage.haskell.org/package/base/docs/Prelude.html#v:map"
administrator@holumbus:~/haskell/hunt$ curl -s 'http://localhost:3000/search/map/0/304' | jq '.["msg"]["result"][]["uri"]' | nl | grep 'http://hackage.haskell.org/package/base/docs/Prelude.html#v:map'
   232  "http://hackage.haskell.org/package/base/docs/Prelude.html#v:map"
administrator@holumbus:~/haskell/hunt$ curl -s 'http://localhost:3000/search/map/0/305' | jq '.["msg"]["result"][]["uri"]' | nl | grep 'http://hackage.haskell.org/package/base/docs/Prelude.html#v:map'
    72  "http://hackage.haskell.org/package/base/docs/Prelude.html#v:map"
administrator@holumbus:~/haskell/hunt$ curl -s 'http://localhost:3000/search/map/0/300' | jq '.["msg"]["result"][]["uri"]' | nl | grep 'http://hackage.haskell.org/package/base/docs/Prelude.html#v:map'
   174  "http://hackage.haskell.org/package/base/docs/Prelude.html#v:map"
administrator@holumbus:~/haskell/hunt$ curl -s 'http://localhost:3000/search/map/1/300' | jq '.["msg"]["result"][]["uri"]' | nl | grep 'http://hackage.haskell.org/package/base/docs/Prelude.html#v:map'
    77  "http://hackage.haskell.org/package/base/docs/Prelude.html#v:map"
administrator@holumbus:~/haskell/hunt$ curl -s 'http://localhost:3000/search/map/2/300' | jq '.["msg"]["result"][]["uri"]' | nl | grep 'http://hackage.haskell.org/package/base/docs/Prelude.html#v:map'
   252  "http://hackage.haskell.org/package/base/docs/Prelude.html#v:map"
administrator@holumbus:~/haskell/hunt$ curl -s 'http://localhost:3000/search/map/3/300' | jq '.["msg"]["result"][]["uri"]' | nl | grep 'http://hackage.haskell.org/package/base/docs/Prelude.html#v:map'
   179  "http://hackage.haskell.org/package/base/docs/Prelude.html#v:map"
administrator@holumbus:~/haskell/hunt$ curl -s 'http://localhost:3000/search/map/4/300' | jq '.["msg"]["result"][]["uri"]' | nl | grep 'http://hackage.haskell.org/package/base/docs/Prelude.html#v:map'
   228  "http://hackage.haskell.org/package/base/docs/Prelude.html#v:map"

this makes paging completely useless

I've discoverd this, while reasearching for https://github.com/hunt-framework/hayoo/issues/18

UweSchmidt commented 10 years ago

Looks like paging is done before sorting? Looks wired.

UweSchmidt commented 10 years ago

That's of course a bug, but it's not totally a show stopper. This effect occurs when a lot of results have the same score, with map there are lots of documents with score 3.225 (45 documents). The sorting algorithm (a priority queue with limited capacity, rather efficient) is currently nondeterministic for equal scores, and that raises the problem with paging, so that has to be corrected.

sebastian-philipp commented 10 years ago

that is ok, if the most important results stay at the top of the results

UweSchmidt commented 10 years ago

ranking of search results done with a LimitedPriorityQueue implementation independent of Score. Sorting is done with wrapper types for Document and (Word, Score) with an appropriate total ordering of elements. There is no longer any non determinism in the ranking algorithm, even with equal scores, so paging should work as if the complete result list is sorted and then the interval of results is extracted with drop and take.