maxlath / wikibase-cli

read and edit a Wikibase instance from the command line
MIT License
223 stars 24 forks source link

Is it supposed to be useful also in a scripts? #137

Closed matkoniecz closed 3 years ago

matkoniecz commented 3 years ago

I plan to make small scale crawling script (between 5 000 and 25 000 pages), is it an usable tool for that or would you recommend an alternative?

maxlath commented 3 years ago

It is being used in scripts (ex: here and there, but I guess it ultimately depends on what you want to do in your script: feel welcome to describe more in details how you're thinking to use it, or link to a first implementation of your script, maybe I can make recommendations from there

matkoniecz commented 3 years ago

I planned to write script that will primarily read data, but from what I see in all cases reading data is not done at all or done using SPARQL.

https://github.com/maxlath/wikidata-scripting/tree/master/youtube_links - reads no data - probably will either fail if claim (possibly conflicting one) is existing or it will be overriden.

https://github.com/maxlath/wikidata-scripting/tree/master/import_arbitrary_data_to_wikibase - seems that deduplication is supposed to happen before this script is run

https://github.com/maxlath/wikidata-scripting/tree/master/import_writers_pseudonymes_from_dbpedia - reading happens via SPQRQL

maxlath commented 3 years ago

yes, we currently lack some "smart patching" features, all the data control is indeed expected to be done ahead

matkoniecz commented 3 years ago

This is not a problem for me - rather about is there a good way to get value of a specific claim and so on.

maxlath commented 3 years ago

a possibility to get data in a script could be to use wb data, and eventually manipulate the result with something like jq:

# get all Q1 data
wb data Q1

# get all Q1 data in a simplified format
wb data Q1 --simplify

# get all Q1 P1424 claims data
wb data Q1#P1424

# get all Q1 P1424 claims data in a simplified format
wb data Q1#P1424 --simplify

# get the data for the claim identified by Q1-9741c622-fc42-4646-96ec-c594933d74c0
wb data Q1-9741c622-fc42-4646-96ec-c594933d74c0

# get the data for the claim identified by Q1-9741c622-fc42-4646-96ec-c594933d74c0 in a customized simplified format 
wb data Q1-9741c622-fc42-4646-96ec-c594933d74c0 --simplify --keep ids,references,qualifiers,hashes,nontruthy
matkoniecz commented 3 years ago

And capture stdin for processing? I guess that main worry is that I still need to manually parse text to hadle errors.

maxlath commented 3 years ago

you could either check stdin

Q1_P31_claims_data=$(wd data Q1#P31)
if [[ "$Q1_P31_claims_data" != "" ]] ; do
  echo "Q1 has a P31 claim"
else
  echo "Q1 doesn't have a P31 claim"
fi

or the exit code, which will be 1 if no value can be found

wd data Q1#P31 > /dev/null && {
  echo "Q1 has a P31 claim"
} || {
  echo "Q1 does't have a P31 claim"
}
matkoniecz commented 3 years ago

OK, thanks for answering! I think that I will close it.

Thanks again!