kartevonmorgen / openfairdb

Open Fair DB is the CreativCommons Backend of Kartevonmorgen.org
http://www.openfairdb.org
GNU Affero General Public License v3.0
55 stars 18 forks source link

Bug: missing entries in search results #183

Closed regenduft closed 5 years ago

regenduft commented 5 years ago

While moving the map by a small amount, entries IN THE MIDDLE of the map appear and disappear. Example https://kartevonmorgen.org/#/?center=49.428,8.682&zoom=17.00 vs https://kartevonmorgen.org/#/?center=49.427,8.682&zoom=17.00 works on my PC but surely depends on the window size. 2-3 entries are disappeared in the second link)

Its not caused by the client, its the API. Example:

https://api.ofdb.io/v0/search?text=&categories=2cd00bebec0c48ba9db761da48678134,c2dc278a2d6a4b9b8a50cb606fc017ed,77b3c33a92554bcf8e8c2c86cedd6f6f&bbox=49.423704,8.679135,49.434310,8.687782

=> 5 entries

https://api.ofdb.io/v0/search?text=&categories=2cd00bebec0c48ba9db761da48678134,c2dc278a2d6a4b9b8a50cb606fc017ed,77b3c33a92554bcf8e8c2c86cedd6f6f&bbox=49.424304,8.679242,49.434910,8.687890

=> 6 entries

The bounding boxes are very very equal. 1st http://bboxfinder.com/#49.423704,8.679135,49.434310,8.687782

2nd http://bboxfinder.com/#49.424304,8.679242,49.434910,8.687890

The missing entry is located in the middle of this bounding box AND the first query also does NOT return the missing entry as part of the invisible results

Missing entry: {"id":"0eaadfbf6b5041f29dc836de2cefa2d7","lat":49.42701667986206,"lng":8.682615435068783,"title":"Two Wheels Schemenauer","description":"Fahrradverkauf und Reparatur","categories":["77b3c33a92554bcf8e8c2c86cedd6f6f"],"tags":["bundhshd","mobilität","mobilität&werkstätten","werkstatt"],"ratings":{"total":0.0,"diversity":0.0,"fairness":0.0,"humanity":0.0,"renewable":0.0,"solidarity":0.0,"transparency":0.0}}

visible results: 1st: {"visible":[{"id":"91c868499f144f76a34bd562329a39dc","lat":49.42831474329732,"lng":8.687704507581753,"title":"Mahlzahn Vollkornbäckerei","description":"Alles leckere aus Vollkornmehl, Bio-Vollkorn-Bäckerei","categories":["77b3c33a92554bcf8e8c2c86cedd6f6f"],"tags":["bio","bundhshd","mobile"],"ratings":{"total":0.0,"diversity":0.0,"fairness":0.0,"humanity":0.0,"renewable":0.0,"solidarity":0.0,"transparency":0.0}},{"id":"e626964b1ef24f8db9fdd7cfb9167ce5","lat":49.43147258340915,"lng":8.682691961844775,"title":"Eva's Lädchen","description":"Naturkostladen","categories":["77b3c33a92554bcf8e8c2c86cedd6f6f"],"tags":["bio","bundhshd","naturkost"],"ratings":{"total":0.0,"diversity":0.0,"fairness":0.0,"humanity":0.0,"renewable":0.0,"solidarity":0.0,"transparency":0.0}},{"id":"a31d23e8958c4125b70cb0870f7d8642","lat":49.424387863569144,"lng":8.686553839913827,"title":"Solidarische Landwirtschaft Rhein-Neckar e.V.","description":"Teilen der Ernte und Herausforderungen einer bäuerlichen Biolandwirtschaft","categories":["2cd00bebec0c48ba9db761da48678134"],"tags":["bio","bundhshd","solawi"],"ratings":{"total":0.0,"diversity":0.0,"fairness":0.0,"humanity":0.0,"renewable":0.0,"solidarity":0.0,"transparency":0.0}},{"id":"b45bbf02b265481b84cf458e793fd871","lat":49.42988081808662,"lng":8.679778579939054,"title":"Taekwon-Do Center Heidelberg","description":"Traditionelle Kampfkunst mit dem Ziel Menschen ganzheitlich Stark zu machen (Körper, Psyche & Soziales)","categories":["77b3c33a92554bcf8e8c2c86cedd6f6f"],"tags":["gesundheit","kampfkunst","körper","psyche","selbstsicherheit","taekwon-do"],"ratings":{"total":0.0,"diversity":0.0,"fairness":0.0,"humanity":0.0,"renewable":0.0,"solidarity":0.0,"transparency":0.0}},{"id":"c5e82d4091c744ad8b50eaf96d62a4fd","lat":49.42760475418885,"lng":8.680235393662116,"title":"Heidelberger Partnerschaftskaffee e.V.","description":"biologisch angebaute Arabica-Kaffees von Kleinbauern aus Lateinamerika","categories":["2cd00bebec0c48ba9db761da48678134"],"tags":["bio","bundhshd","fair"],"ratings":{"total":0.0,"diversity":0.0,"fairness":0.0,"humanity":0.0,"renewable":0.0,"solidarity":0.0,"transparency":0.0}},{"id":"54e5318897d741ba9a868dc2644ebdae","lat":49.4331562563,"lng":8.682753652652146,"title":"Biker's Paradise","description":"Fahrradverleih, Verkauf und Reparatur, Mietstation der Pedelec-Vermietung Rückenwind","categories":["77b3c33a92554bcf8e8c2c86cedd6f6f"],"tags":["bundhshd","mobilität","mobilität&werkstätten","werkstatt"],"ratings":{"total":0.0,"diversity":0.0,"fairness":0.0,"humanity":0.0,"renewable":0.0,"solidarity":0.0,"transparency":0.0}}],"invisible":[]}

2nd:

{"visible":[{"id":"54e5318897d741ba9a868dc2644ebdae","lat":49.4331562563,"lng":8.682753652652146,"title":"Biker's Paradise","description":"Fahrradverleih, Verkauf und Reparatur, Mietstation der Pedelec-Vermietung Rückenwind","categories":["77b3c33a92554bcf8e8c2c86cedd6f6f"],"tags":["bundhshd","mobilität","mobilität&werkstätten","werkstatt"],"ratings":{"total":0.0,"diversity":0.0,"fairness":0.0,"humanity":0.0,"renewable":0.0,"solidarity":0.0,"transparency":0.0}},{"id":"91c868499f144f76a34bd562329a39dc","lat":49.42831474329732,"lng":8.687704507581753,"title":"Mahlzahn Vollkornbäckerei","description":"Alles leckere aus Vollkornmehl, Bio-Vollkorn-Bäckerei","categories":["77b3c33a92554bcf8e8c2c86cedd6f6f"],"tags":["bio","bundhshd","mobile"],"ratings":{"total":0.0,"diversity":0.0,"fairness":0.0,"humanity":0.0,"renewable":0.0,"solidarity":0.0,"transparency":0.0}},{"id":"a31d23e8958c4125b70cb0870f7d8642","lat":49.424387863569144,"lng":8.686553839913827,"title":"Solidarische Landwirtschaft Rhein-Neckar e.V.","description":"Teilen der Ernte und Herausforderungen einer bäuerlichen Biolandwirtschaft","categories":["2cd00bebec0c48ba9db761da48678134"],"tags":["bio","bundhshd","solawi"],"ratings":{"total":0.0,"diversity":0.0,"fairness":0.0,"humanity":0.0,"renewable":0.0,"solidarity":0.0,"transparency":0.0}},{"id":"e626964b1ef24f8db9fdd7cfb9167ce5","lat":49.43147258340915,"lng":8.682691961844775,"title":"Eva's Lädchen","description":"Naturkostladen","categories":["77b3c33a92554bcf8e8c2c86cedd6f6f"],"tags":["bio","bundhshd","naturkost"],"ratings":{"total":0.0,"diversity":0.0,"fairness":0.0,"humanity":0.0,"renewable":0.0,"solidarity":0.0,"transparency":0.0}},{"id":"b45bbf02b265481b84cf458e793fd871","lat":49.42988081808662,"lng":8.679778579939054,"title":"Taekwon-Do Center Heidelberg","description":"Traditionelle Kampfkunst mit dem Ziel Menschen ganzheitlich Stark zu machen (Körper, Psyche & Soziales)","categories":["77b3c33a92554bcf8e8c2c86cedd6f6f"],"tags":["gesundheit","kampfkunst","körper","psyche","selbstsicherheit","taekwon-do"],"ratings":{"total":0.0,"diversity":0.0,"fairness":0.0,"humanity":0.0,"renewable":0.0,"solidarity":0.0,"transparency":0.0}},{"id":"c5e82d4091c744ad8b50eaf96d62a4fd","lat":49.42760475418885,"lng":8.680235393662116,"title":"Heidelberger Partnerschaftskaffee e.V.","description":"biologisch angebaute Arabica-Kaffees von Kleinbauern aus Lateinamerika","categories":["2cd00bebec0c48ba9db761da48678134"],"tags":["bio","bundhshd","fair"],"ratings":{"total":0.0,"diversity":0.0,"fairness":0.0,"humanity":0.0,"renewable":0.0,"solidarity":0.0,"transparency":0.0}},{"id":"0eaadfbf6b5041f29dc836de2cefa2d7","lat":49.42701667986206,"lng":8.682615435068783,"title":"Two Wheels Schemenauer","description":"Fahrradverkauf und Reparatur","categories":["77b3c33a92554bcf8e8c2c86cedd6f6f"],"tags":["bundhshd","mobilität","mobilität&werkstätten","werkstatt"],"ratings":{"total":0.0,"diversity":0.0,"fairness":0.0,"humanity":0.0,"renewable":0.0,"solidarity":0.0,"transparency":0.0}}],"invisible":[]}

uklotzde commented 5 years ago

Confirmed. The lead developer of Tantivy mentioned that range queries are currently not optimized, and maybe they are also not extensively tested. All existing unit tests for bbox queries in openfairdb work as expected.

Next steps:

  1. Restart the service a and check if the issue persists after all entries have been re-indexed
  2. Add a regression test with all entries and their coordinates from this report and try to reproduce and isolate the error
fulmicoton commented 5 years ago

If your index is small, and you really suspect tantivy, can you send me the entire index, and the terms for the faulty range query. I'll see what I can do...

uklotzde commented 5 years ago

@fulmicoton Thanks for offering your help, Paul! I didn't want to bother you until we have sorted this out on our side.

Unfortunately we are not able to dump the current state of the index, because we are operating Tantivy with the in-memory configuration that is not recommended for production. It is still sufficient and was the fastest way for getting started without planning about schema upgrades or re-using an existing index on disk.

uklotzde commented 5 years ago

I guess our current search strategy is inappropriate. Searching only once with an extended bounding box is not guaranteed to return all visible results when limiting the number of search results.

We need to execute up to two queries:

regenduft commented 5 years ago

If the database does not sort by distance, but the query contains a "LIMIT", this is true.

(Obviously we want to sort by more criteria than distance, but I don't know if this sorting occurs on the database or if you fetch always all visible entries from db and sort and limit the visible entries in the business logic).

fulmicoton commented 5 years ago

If you want to sort by distance to a point, you will have to mark the long lat field as fast field and implement a custom collector, that wraps over the TopScoreCollector for instance. It sounds difficult but it is not that terrible.

Version 0.10.0 will probably make it much easier.

uklotzde commented 5 years ago

Confirmed.

Before: The query returned between 5 and 9 visible entries, varying with every start of the service (= re-indexing). After: The query always returns all 11 visible entries, even when re-indexing all entries.

I'll releas and deploy a new version v0.5.3.

regenduft commented 5 years ago

nice :-)

however in the end we might need the two-queries-solution that always fetches everything for the visible area, without limit.

this bug report addresses the issue when the visible area contains only a few results, and this issue would be resolved by sorting by distance before limiting.

But we would get the wrong results in another case: when the visible area itself contains already to many results (then we need to sort them by score, not by distance).

Paul Masurel schrieb am 02.04.2019 14:47:

If you want to sort by distance to a point, you will have to mark the long lat field as fast field and implement a custom collector, that wraps over the TopScoreCollector for instance. It sounds difficult but it is not that terrible.

Version 0.10.0 will probably make it much easier.

-- You are receiving this because you authored the thread. Reply to this email directly or view it on GitHub: https://github.com/slowtec/openfairdb/issues/183#issuecomment-478979305

uklotzde commented 5 years ago

The results are scored internally by Tantivy before cut off. Sorting the results by rating is done as a post-processing step. Combining the internal score with the rating to implement a custom scoring is currently not possible (as far as I know). This is a known limitation and should be tracked by a separate issue.

Scoring by distance is a new requirement that also needs to be addressed separately. Currently there is no concept of a center point. Tantivy does not support geospatial queries, so I consider this requirement out of scope.

regenduft commented 5 years ago

great that its solved. no requirement. it was just an idea how to do it in one query, and as mentioned, the idea was bad it wouldn't even work. thank you!

Uwe Klotz schrieb am 02.04.2019 15:20:

The results are scored internally by Tantivy before cut off. Sorting the results by rating is done as a post-processing step. Combining the internal score with the rating to implement a custom scoring is currently not possible (as far as I know). This is a known limitation and should be tracked by a separate issue.

Scoring by distance is a new requirement that also needs to be addressed separately. Currently there is no concept of a center point. Tantivy does not support geospatial queries, so I consider this requirement out of scope.

-- You are receiving this because you authored the thread. Reply to this email directly or view it on GitHub: https://github.com/slowtec/openfairdb/issues/183#issuecomment-478992311

uklotzde commented 5 years ago

Fixed and deployed.

@regenduft Thanks for the detailed report!! You may check the results again now. All entries in the visible region should remain visible when slightly moving the bbox.