Closed brawer closed 1 year ago
You maybe interested by this. https://www.openstreetmap.org/user/Geonick/diary/399523
Unfortunately this analyser stop sending report on September.
This is more recent and (manually) updated: https://matkoniecz.github.io/OSM-wikipedia-tag-validator-reports/ (could it be implemented directly in Osmose?)
Could it be implemented directly in Osmose?
Yes. XML have to be uploaded to Osmose Frontend.
https://wiki.openstreetmap.org/wiki/Osmose#Issues_reporting_API
Could it be implemented directly in Osmose?
Yes. XML have to be uploaded to Osmose Frontend.
https://wiki.openstreetmap.org/wiki/Osmose#Issues_reporting_API
CC @matkoniecz in case he's interested.
Could Osmose check if wikipedia tags are: (...) consistent withwikidata tags on the same feature?
Note that in many cases this is tricky: for example people link things like "List of artworks in museum X#Painting Foobar" and wikidata for painting itself.
This is not clearly wrong and not sure is mass removing it considered as acceptable by general community. Definitely it is not blatantly wrong.
See https://www.openstreetmap.org/node/2444385778 and many other reports at https://matkoniecz.github.io/OSM-wikipedia-tag-validator-reports/Deutschland.html#wikipedia%20wikidata%20mismatch (and I am already skipping many such mismatches in this list!)
See also #1157
For implementing this myself:
In general right now I am still solving some known issues reported by users, which has higher priority but in general I would be open to this. I already tried MR export but that ended going nowhere due to MR bug that I reported.
If someone is interested in working on it and they would want to reuse my code - let me know and I can do some further work on making its setup less ridiculous and actually doable by other people.
(though I admit that Osmose is a bit low priority, mostly due to Osmose claiming that some things require changes despite not requiring them at all, see #1159 #462 #1094 - if I would have time for Osmose improvements I would start with PRs for that)
Unfortunately this analyser stop sending report on September.
@frodrigo there was a series of failures, mainly due to incomplete or broken wikidata dumps or osmose-country-relations. yesterday was finally a successful run, but a 500-error uploading to osmose (investigations pending, i expect to find time next week).
consistent wikidata/wikipedia ... tricky in many cases
@matkoniecz fully agree with that. we experienced the same but a very interesting tool! some of the checks we also do (e.g. disambig pages). others we could add. i hope to find time for this soon and we can possibly use synergies (i'm a maintainer of the tool )
Unfortunately this analyser stop sending report on September.
@frodrigo there was a series of failures, mainly due to incomplete or broken wikidata dumps or osmose-country-relations. yesterday was finally a successful run, but a 500-error uploading to osmose (investigations pending, i expect to find time next week).
The upload process was broken this night until just now. You can try to upload now.
The check is back, closing here.
Just a quick update here:
Unfortunately this analyser stop sending report on September.
The Wikidata analyser is no running every sunday evening (wikidata dumps come out once a week only)
Could Osmose check if
wikipedia
tags are: (a) valid; (b) not a redirect; (c) consistent withwikidata
tags on the same feature? Likewise for related tags such asartist:wikipedia
,subject:wikipedia
,species:wikipedia
,name:etymology:wikipedia:en
, etc.To help with this, I’m computing a mapping table that Osmose might want to use: https://qrank-storage.wmcloud.org/data/sitelinks.mdb.zst
This is a zstd-compressed LMDB database, mapping keys like
fr:musée_du_louvre
to values likeQ19675
. The keys have undergone full language-sensitive case-folding as per the Unicode specification. Currently I’m not adding data for deleted/redirected pages on Wikipedia, only current page titles. (For redirects, I’m thinking about generating a similar table whose value would be redirection targets; this could then be used by Osmose to generate edit suggestions for stalewikipedia
tags. However, that’s a separate project).To check for updates to the mapping table, consider HTTP ETags. I’m currently setting up a cronjob that will re-generate the data file about once per week.
To use LMDB from Python, see module lmdb. Please tell me if the data format works for Osmose. Because of the large data volume, you’ll need a 64-bit machine and a few gigabytes of disk, but (thanks to how LMDB is mapping its database files into virtual memory) only very little physical RAM.