openfoodfacts / smooth-app

🤳🥫 The new Open Food Facts mobile application for Android and iOS, crafted with Flutter and Dart
https://world.openfoodfacts.org/open-food-facts-mobile-app?utm_source=off&utf_medium=web&utm_campaign=github-repo
Apache License 2.0
863 stars 286 forks source link

Price/proof addition : location field : restrict OSM types #5568

Closed raphodn closed 2 months ago

raphodn commented 2 months ago

Some users are adding prices and linking them to non-desired locations like cities, countries, roads...

In the Open Prices web frontend we remove some OSM POI results using a blacklist, to avoid having proofs/prices linked to a non-shop location.

The list can be seen here : https://github.com/openfoodfacts/open-prices-frontend/blob/master/src/constants.js > NOMINATIM_RESULT_TYPE_EXCLUDE_LIST Linked frontend issue : https://github.com/openfoodfacts/open-prices-frontend/issues/37

If needed, we could move this blacklist to the backend, and have it available via an API endpoint ?

monsieurtanuki commented 2 months ago

@raphodn I've just run some default search - https://photon.komoot.io/api?q=berlin&bbox=9.5,51.5,11.5,53.5 In that case, which "type" are you referring to, "type": "street" or "osm_key": "highway"? Neither value is part of your exclusion list...

{
            "geometry": {
                "coordinates": [
                    9.7517684,
                    52.3738781
                ],
                "type": "Point"
            },
            "type": "Feature",
            "properties": {
                "osm_id": 185830087,
                "extent": [
                    9.7517684,
                    52.3738781,
                    9.7517784,
                    52.3734873
                ],
                "country": "Allemagne",
                "city": "Hanovre",
                "countrycode": "DE",
                "postcode": "30159",
                "locality": "Bult",
                "county": "Région de Hanovre",
                "type": "street",
                "osm_type": "W",
                "osm_key": "highway",
                "district": "Ville-Sud-Bult",
                "osm_value": "secondary",
                "name": "Berliner Allee",
                "state": "Basse-Saxe"
            }
        },

Btw wouldn't an inclusion list or a search filter be more relevant, like https://photon.komoot.io/api?q=berlin&bbox=9.5,51.5,11.5,53.5&osm_tag=shop?

raphodn commented 2 months ago

The exclusion list looks at :

And what is stored (and displayed) afterwards per location :

Bonus : the current top 30 location types in OP https://github.com/openfoodfacts/open-prices/wiki/Stats#top-location-osm-types

wouldn't an inclusion list or a search filter be more relevant

there are prices everywhere, not only supermarkets, but also pharmacies, restaurants, bakeries, bookstores... is it possible to have dozens in the inclusion list url ?

monsieurtanuki commented 2 months ago

there are prices everywhere, not only supermarkets, but also pharmacies, restaurants, bakeries, bookstores... is it possible to have dozens in the inclusion list url ?

Looks so: https://photon.komoot.io/api?q=berlin&bbox=9.5,51.5,11.5,53.5&osm_tag=amenity:pharmacy&osm_tag=shop&limit=100

monsieurtanuki commented 2 months ago

@raphodn According to your stats, I guess we would be ok in a first approach filtering on amenity and shop, right? https://photon.komoot.io/api?q=carrefour,%20paris&osm_tag=amenity&osm_tag=shop&limit=100

raphodn commented 2 months ago

the stats are just for info, there are 900+ locations so only a subset show up in the top 30 types, at no moment did I say we should restrict... but they DO show that places like house or city have been used as locations.

Recently I've been adding prices in restaurants, greengrocers, diy shops, bars, pharmacies... so i'm in favor of as much choices as we can give the user.

So probably sticking with a blacklist rather than a (long aka "dozens" as stated above) whitelist

monsieurtanuki commented 2 months ago

at no moment did I say we should restrict

You meant a blacklist that doesn't restrict?

Besides

raphodn commented 2 months ago

oook my bad, i re-read the whole thread, I understand now what you mean, by filtering on the osm_tag only, which is much less restrictive than osm_value. Just need to test a bit but your photon url looks good :100:

the full list of location types : https://gist.github.com/raphodn/7c53c4a3403f0f86e89f09e9e7a7ddaf

monsieurtanuki commented 2 months ago

@raphodn Of course for better stat analysis we should see how many prices are used with "right" or "wrong" locations, but from what I could see in your list whitelisting to shop and amenity looks reasonable.

shop ```log shop:supermarket:621 shop:convenience:77 shop:chemist:13 shop:variety_store:13 shop:bakery:7 shop:mall:7 shop:deli:7 shop:frozen_food:6 shop:greengrocer:5 shop:furniture:5 shop:department_store:4 shop:farm:4 shop:wholesale:4 shop:books:3 shop:sports:3 shop:doityourself:3 shop:newsagent:2 shop:gift:2 shop:beauty:1 shop:garden_centre:1 shop:computer:1 shop:kiosk:1 shop:cheese:1 shop:dairy:1 shop:electronics:1 shop:car_repair:1 shop:hardware:1 shop:clothes:1 shop:ticket:1 shop:interior_decoration:1 shop:travel_agency:1 shop:pasta:1 shop:toys:1 shop:health_food:1 shop:general:1 shop:outpost:1 ```
amenity ```log amenity:fuel:18 amenity:pharmacy:6 amenity:fast_food:3 amenity:cafe:3 amenity:bar:2 amenity:university:2 amenity:restaurant:2 amenity:post_office:2 amenity:charging_station:1 amenity:place_of_worship:1 amenity:bicycle_rental:1 amenity:parking:1 amenity:bus_station:1 amenity:food_court:1 amenity:community_centre:1 amenity:ice_cream:1 amenity:casino:1 amenity:veterinary:1 ```
no shop and no amenity ```log ::11 boundary:administrative:25 boundary:local_authority:2 boundary:political:2 building:commercial:2 building:retail:2 building:supermarket:1 building:yes:4 highway:bus_stop:7 highway:motorway_junction:1 highway:pedestrian:1 highway:primary:1 highway:residential:2 highway:secondary:2 historic:fort:1 landuse:cemetery:1 landuse:commercial:2 landuse:construction:5 landuse:farmyard:1 landuse:greenfield:1 landuse:industrial:4 landuse:residential:1 landuse:retail:6 leisure:stadium:1 man_made:bridge:1 man_made:street_cabinet:1 man_made:surveillance:2 natural:peak:1 office:company:2 place:city:3 place:hamlet:1 place:house:7 place:plot:1 place:suburb:3 place:town:2 railway:station:6 railway:yard:1 ```
raphodn commented 2 months ago

ok following this discussion I opened a PR in the web frontend.

I had a look at the building, for instance : https://www.openstreetmap.org/way/174737917 They should be labeled as shop in the case of supermarket or equivalent, so the problem is on OSM side, but being too restrictive might discourage some contributors... (instead of allowing more POIs, and fixing afterwards).

Do you think we should add building in the whitelist ? Or just keep it to shop & amenity for now ?

monsieurtanuki commented 2 months ago

The question is: how to deal with crap OSM data, before we put it in Prices and after.

As a user, I find it very painful to find "carrefour" shops in Paris because both "carrefour" and "Paris" have different meanings:

Therefore I would really appreciate being able to enter just "carrefour paris" without ambiguity when talking about shops.

That said, "your" LIDL cannot be found as a shop. We could introduce an optional "advanced" search mode, without shop/amenity filter, for obvious cases where OSM data is slightly flawed?

In parallel, we may enhance the whitelist.

raphodn commented 2 months ago

After a few hours thought (and some previous discussions on the subject), I would go with :

monsieurtanuki commented 2 months ago

I may have an even better solution:

2 API calls being transparent for the user that sees a single sorted list.

raphodn commented 2 months ago

So you would combine the 2 results ? it might "bloat" the results.

but the idea of opening up the search is good, it could be a user action;

monsieurtanuki commented 2 months ago

@raphodn This is what I have in mind:

With this we provide the user with a better UX (e.g. "carrefour paris" really delivers shops), while letting the OSM data being slightly crappy.

What do you think of that?

relevant search first unfiltered results if needed
Screenshot_1726077190 Screenshot_1726077197