osm-fr / osmose-backend

Part of osmose that runs the analysis, and send the results to the frontend.
GNU General Public License v3.0
93 stars 114 forks source link

Add checks on Wikipedia tags #1684

Closed brawer closed 1 year ago

brawer commented 1 year ago

Could Osmose check if wikipedia tags are: (a) valid; (b) not a redirect; (c) consistent withwikidata tags on the same feature? Likewise for related tags such as artist:wikipedia, subject:wikipedia, species:wikipedia, name:etymology:wikipedia:en, etc.

To help with this, I’m computing a mapping table that Osmose might want to use: https://qrank-storage.wmcloud.org/data/sitelinks.mdb.zst

This is a zstd-compressed LMDB database, mapping keys like fr:musée_du_louvre to values like Q19675. The keys have undergone full language-sensitive case-folding as per the Unicode specification. Currently I’m not adding data for deleted/redirected pages on Wikipedia, only current page titles. (For redirects, I’m thinking about generating a similar table whose value would be redirection targets; this could then be used by Osmose to generate edit suggestions for stale wikipedia tags. However, that’s a separate project).

To check for updates to the mapping table, consider HTTP ETags. I’m currently setting up a cronjob that will re-generate the data file about once per week.

To use LMDB from Python, see module lmdb. Please tell me if the data format works for Osmose. Because of the large data volume, you’ll need a 64-bit machine and a few gigabytes of disk, but (thanks to how LMDB is mapping its database files into virtual memory) only very little physical RAM.

frodrigo commented 1 year ago

You maybe interested by this. https://www.openstreetmap.org/user/Geonick/diary/399523

Unfortunately this analyser stop sending report on September.

ivanbranco commented 1 year ago

This is more recent and (manually) updated: https://matkoniecz.github.io/OSM-wikipedia-tag-validator-reports/ (could it be implemented directly in Osmose?)

frodrigo commented 1 year ago

Could it be implemented directly in Osmose?

Yes. XML have to be uploaded to Osmose Frontend.

https://wiki.openstreetmap.org/wiki/Osmose#Issues_reporting_API

ivanbranco commented 1 year ago

Could it be implemented directly in Osmose?

Yes. XML have to be uploaded to Osmose Frontend.

https://wiki.openstreetmap.org/wiki/Osmose#Issues_reporting_API

CC @matkoniecz in case he's interested.

matkoniecz commented 1 year ago

Could Osmose check if wikipedia tags are: (...) consistent withwikidata tags on the same feature?

Note that in many cases this is tricky: for example people link things like "List of artworks in museum X#Painting Foobar" and wikidata for painting itself.

This is not clearly wrong and not sure is mass removing it considered as acceptable by general community. Definitely it is not blatantly wrong.

See https://www.openstreetmap.org/node/2444385778 and many other reports at https://matkoniecz.github.io/OSM-wikipedia-tag-validator-reports/Deutschland.html#wikipedia%20wikidata%20mismatch (and I am already skipping many such mismatches in this list!)


See also #1157


For implementing this myself:

In general right now I am still solving some known issues reported by users, which has higher priority but in general I would be open to this. I already tried MR export but that ended going nowhere due to MR bug that I reported.

If someone is interested in working on it and they would want to reuse my code - let me know and I can do some further work on making its setup less ridiculous and actually doable by other people.

(though I admit that Osmose is a bit low priority, mostly due to Osmose claiming that some things require changes despite not requiring them at all, see #1159 #462 #1094 - if I would have time for Osmose improvements I would start with PRs for that)

turbotimon commented 1 year ago

Unfortunately this analyser stop sending report on September.

@frodrigo there was a series of failures, mainly due to incomplete or broken wikidata dumps or osmose-country-relations. yesterday was finally a successful run, but a 500-error uploading to osmose (investigations pending, i expect to find time next week).

consistent wikidata/wikipedia ... tricky in many cases

@matkoniecz fully agree with that. we experienced the same but a very interesting tool! some of the checks we also do (e.g. disambig pages). others we could add. i hope to find time for this soon and we can possibly use synergies (i'm a maintainer of the tool )

frodrigo commented 1 year ago

Unfortunately this analyser stop sending report on September.

@frodrigo there was a series of failures, mainly due to incomplete or broken wikidata dumps or osmose-country-relations. yesterday was finally a successful run, but a 500-error uploading to osmose (investigations pending, i expect to find time next week).

The upload process was broken this night until just now. You can try to upload now.

frodrigo commented 1 year ago

The check is back, closing here.

turbotimon commented 1 year ago

Just a quick update here:

Unfortunately this analyser stop sending report on September.

The Wikidata analyser is no running every sunday evening (wikidata dumps come out once a week only)