Closed MansMeg closed 11 months ago
Re 2, they're not new people, but new attributes or whatever that cause more rows to be created in the input/matching/*.csv files.
I will try to outline a procedure for updates from wikidata before the next time we do it, hopefully to avoid some of the trouble we ran into this time.
Let me know if WD has errors
I saw some earlier edits done by the project in Wikidata without sources....
In WD we can just have sources on properties e.g. change Q5792849 1940037453 on the Name property
—— My personal opinion is that the alias field should be used when doing Named Entity Recognition and can contain “all kind” of information compared to the Name Property P2561 were we should have sources confirming the values
In this particular case, it was two individuals in question. My previous edits (with source) were further edited. I put the changes back yesterday.
The edits in question have to do with apparent spelling variants of iort. I don't know myself what the correct variant is, my edits are in line with the spelling in the bio books. If there are sources for the other spelling, then I guess both variants should be on wikidata.
Yes. But the spelling in the biobooks should be the one that is used when the reference is the biobooks. Right, @salgo60 ?
That's what i was trying to say - my edits have bio book sources and spelling. If alt spellings will also be entered, they should get their own source.
Yes. But the spelling in the biobooks should be the one that is used when the reference is the biobooks. Right, @salgo60 ?
Yes as mentioned before
the book "Tvåkammar-riksdagen 1867–1970" has rather often more articles about the same see > 150 persons, see a small check
What would be interesting is if we could confirm what is stated in the books with where its mentioned in your corpus and get a better understanding/quality by adding a Property:P4584 "first appearance" based on your corpus
I would also like to see in your data
SPARQL depreciated statements - swedish - english
I like the idea of persistent identifiers. Until then, I think we can solve (close) this issue with a metadata update procedure.
start a fresh branch off dev
requery metadata
scripts/wikidata_query.py
and 'scripts/wikidata_process.py`
run test.db.py
locally
---> update wiki_ids in unit test files (I will write a script to do this efficiently) ---> address edits on wikidata
repeat 1 and 2 until test.db.py
passes
redetect.py
to remap speakers to intros in protocols
run test.mp.py
locally (other tests?)
save diff to (an untracked) file
sample-git-dif
on protocols
git add ONLY sampled protocols -commit / push -open pr -- post markdown
---> unit tests will still fail on remote : is ok
when sampled diffs look ok
add /commit /push rest of protocols
unit tests should pass on remote --> merge
The issues last time around would have been spotted and fixed very quickly if I were following this as a guide.
That sounds like a good solution. Maybe put this in the repo wiki for now?
FYI: We have a suspected duplicate in WIkidata that I have asked other people for a second opinion but no feedback yet
I used Property:P460 "said to be the same as"
Maybe put this in the repo wiki for now?
done.
In the discussion of Pull Request #344 we identified four different problems
A potential solution is to structure the metadata updates more than we currently do to capture potential problems more efficiently.