Closed MansMeg closed 10 months ago
This is a problem upstream https://www.wikidata.org/wiki/Q5885293
Yes. The question is how to solve this. I guess we would like to remove stuff in our corpus but that people might want to keep in wikidata, so that there will not be a perfect alignment with wikidata. Maybe add a csv with stuff we exclude from wikidata we add to the updating script from wikidata? Or do you have another solution?
I mean those misspellings could be just fixed on wikidata?
EDIT: AFAIK those additional commas don't introduce any errors to our corpus
No. I know. My point is that sooner or later we might end up with differences. But maybe not in the next couple of moths. Then fixing this in wikidata is probably easiest.
They need to be edited on wikidata:
Ping @salgo60 . Is this something you could take a pass on?
I think that one is actually a problem with us grabbing the data. Here we use the alias that is incorrect. @BobBorges , right?
All checked not all changed as I didnt see a problem...
My changelog Special:Contributions/Salgo60
over and out now I will go and sleep in my hammock for some days ;-)
Off topic I mentioned your project today as a pattern how other organizations should work with its metadata
Should be fixed now. If we find this as an issue again, we could write a unit test. Caused by trailing commas (removed on wikidata) and alias/i-ort in the format surname
-iort
, firstname
. Fixed on wikidata.
See for example: Q5885293,"Kråkered," (now fixed)
We should not include location specifiers with punctuation if the ordinary name exist (like Kråkered in this case).
see: https://github.com/welfare-state-analytics/riksdagen-corpus/blob/main/corpus/metadata/location_specifier.csv