hbz / nwbib

Die Nordrhein-Westfälische Bibliographie
http://nwbib.de
3 stars 2 forks source link

Batch load NWBib links to wikidata #469

Closed acka47 closed 5 years ago

acka47 commented 5 years ago

Depends on #446.

Add link to the respective NWBib titles to all Wikidata entries we have in our spatial classification. QuickStatements is the right tool for this job.

acka47 commented 5 years ago

@fsteeg We need a csv file like this to import the statements into Wikidata, see https://www.wikidata.org/wiki/Help:QuickStatements#CSV_file_syntax. Can you please create an up-to-date export of QIDs (and als "NID"s like "N03") from https://nwbib.de/spatial?

qid,P6814
Q464473,Q464473
Q1123895,Q1123895
Q17592686,Q17592686
Q897382,Q897382

I will take care of adding the corresponding QIDs to the NIDs.

acka47 commented 5 years ago

I manually created the csv part for the NIDs:

Q1198,N01
Q152243,N03
Q8614,N04
Q462011,N10
Q72931,N12
Q2036208,N13
Q4194,N14
Q580471,N16
Q881875,N18
Q151993,N20
Q153464,N22
Q445609,N24
Q152356,N28
Q1380992,N32
Q1381014,N33
Q1413205,N34
Q7904317,N42
Q836937,N44
Q641138,N45
Q249428,N46
Q152420,N47
Q708742,N48
Q698162,N57
Q657241,N62
Q649192,N63
Q650645,N64
Q697254,N65
Q514557,N66
Q700198,N68
Q573290,N69
Q835382,N70
Q153943,N76
Q829718,N77

(I already use N04 for Westfalen, as required for https://github.com/hbz/lobid-vocabs/issues/89. We may also already adjust it in the SKOS file and the web app. @fsteeg, let's talk about this tomorrow.

fsteeg commented 5 years ago

Generated plus manual entries from https://github.com/hbz/nwbib/issues/469#issuecomment-505402481: qid-p6814.txt

acka47 commented 5 years ago

I am trying this as test batch:

qid,P6814
Q152243,N03
Q445609,N24
Q641138,N45
Q697254,N65
Q829718,N77
Q1475430,Q1475430
Q1805068,Q1805068
Q1919499,Q1919499
Q2531558,Q2531558

However, this is not processed yet. I am not sure whether the badge is just queued or wheter batches submitted by me make problems...

acka47 commented 5 years ago

Just found out that "for using QuickStatements an account needs to be autoconfirmed." From https://www.wikidata.org/wiki/Help:QuickStatements#Limitations

acka47 commented 5 years ago

To make this work, we have to encode the second QID as string, which means – in the CSV file syntax – to add for double quotes before and one after (see the end of the CSV file syntax paragraph). Like so:

qid,P6814
Q868907,""""Q868907"
Q225794,""""Q225794"
Q3771,""""Q3771"
Q225774,""""Q225774"
Q225729,""""Q225729"
Q225120,""""Q225120"
Q225055,""""Q225055"
Q6896,""""Q6896"
Q225621,""""Q225621"
Q225432,""""Q225432"

I let it run for ten example items and it worked good. The complete batch with ~4,300 items is now being processed.

acka47 commented 5 years ago

I think the batch ran through by now.

Note: We should always run a quickstatements batch "in the background" so that it gets its own URL an we can look it up from different browsers during the processing and afterwards. I just chose "run" and it ran in my office computer's browser which I can not access from home. (via @magnusmanske on Twitter).

Result: From 4412 entries in the csv file, 2896 were processed. See the SPARQL query at https://w.wiki/5Nv. We should run another batch for the missing 1516 entries. @fsteeg, can you generate a diff?

fsteeg commented 5 years ago

Original items minus those returned by the SPARQL query: qid-p6814.txt

acka47 commented 5 years ago

The new batch is running, see https://tools.wmflabs.org/quickstatements/#/batch/14856, though no edits have been made yet.

acka47 commented 5 years ago

The batch is done. Via SPARQL, I now get 4393 entries with NWBib ID, see the query (also querying for ags): https://w.wiki/5PC So, 19 are still missing. @fsteeg, can you check which one these are?

fsteeg commented 5 years ago

I get 30 missing: qid-p6814.txt

acka47 commented 5 years ago

I ran another batch (https://tools.wmflabs.org/quickstatements/#/batch/15129), some were not processed again, added them by hand. However, now there are even 4418 entries with NWBib ID, see https://w.wiki/5PC. So there are at least six entries with NWBib ID in Wikidata that are not part of NWBib classification.

acka47 commented 5 years ago

So there are at least six entries with NWBib ID in Wikidata that are not part of NWBib classification.

@fsteeg, can you provide the diff again, please, so that we know which?

fsteeg commented 5 years ago

@fsteeg, can you provide the diff again, please, so that we know which?

I fixed an issue in the CSV processing while doing that, and now I think everything is fine. I don't get any missing entries in Wikidata, and no missing entries in nwbib.de/spatial (there was one remaining, but that was due to a faulty additional nwbibId in https://www.wikidata.org/wiki/Q1980918, which I fixed there).

acka47 commented 5 years ago

Closing.