Open tuukka opened 3 years ago
Is there a good way to fetch one or multiple properties for a (potentially long) list of QIDs
For one property, wd convert
seems to do the job, but it would currently not work for multiple properties. You could write a SPARQL request extending what wd convert
does, but would need to handle the split into batches (wd convert
uses batches of a 1000 at once)
in CSV format
It can get tricky to get from JSON with deeply nested objects to CSV, but could work for some basic cases.
but does it fetch all item data
No, but almost: when you specify --props claims.P6375
, the smallest amount of data we can request to the API is basic info + all the claims by setting props=claims
would it be more difficult to implement
--format csv
?
I gave it a try in the this branch. The proposed syntax would be:
echo Q3572332 Q98407233 Q10428420 | wd data --props claims.P6375 --format csv
and output
id,claims.P6375
Q3572332,"Eläintarhantie 1,Siltasaarenkatu 18"
Q98407233,Agricolankatu 1-3
Q10428420,"Viides linja 11,Fleminginkatu 1,Porthaninkatu 12"
Note that P6375 values are grouped per entity: we could generate several rows per entity as in your version, but I'm not sure how we could make it work for cases where there are several properties (generating all combinations seems unnecessarily verbose). Would that work for your use case?
Thank you for the quick implementation!
I was thinking this would be useful in lots of use cases, but my current use case is trying to find matches between certain Wikidata items and another big dataset (OpenStreetMap) based on street addresses. In this case, I need separate rows for each address to see if any of them match, and if I matched on multiple properties, it would be preferable to get all the combinations to see if any of them match a combination present in the other dataset. Could it make sense to do that by default and have an option like --join-values ,
to get your current output?
Multiple values is the difficult part also in the sense that before today I had no idea how to do the above in jq
. I can manage now but I would not want to suggest anyone to learn this. :sweat_smile: (This made it click in the manual: "Thus as
functions as something of a foreach loop.")
I'm very grateful that you posted those jq commands, I use jq a lot but never encountered those as
before, quite powerful ^^
I have to add I'm not saying the solution for now couldn't be to include an example like these in wikibase-cli's documentation and people can use them as templates for what they need.
I pushed more commits on that branch: now echo Q3572332 Q98407233 Q10428420 | wd data --props claims.P6375 --format csv
outputs
id,claims.P6375
Q3572332,Eläintarhantie 1
Q3572332,Siltasaarenkatu 18
Q98407233,Agricolankatu 1-3
Q10428420,Viides linja 11
Q10428420,Fleminginkatu 1
Q10428420,Porthaninkatu 12
but the previous behaviour can, as suggested, be recovered with --join
. Ready to merge, or do you see any missing feature?
I tested the current version briefly and I would have wanted to specify a custom separator instead of the comma as an argument to --join
as e.g. addresses often contain commas in them.
Also, I expected adding a claim to just result in an added column to the non-joined results, but of course, it turned on the joined mode. I understand this avoids combinatorial explosions but is it more important than consistency? echo Q3572332 Q98407233 Q10428420 | PATH=bin:$PATH wd data --props claims.P6375,claims.P4595 --format csv
:
id,claims.P6375,claims.P4595
Q3572332,"Eläintarhantie 1,Siltasaarenkatu 18",Helsinki
Q98407233,Agricolankatu 1-3,Helsinki
Q10428420,"Viides linja 11,Fleminginkatu 1,Porthaninkatu 12",Helsinki
(By the way, I also noticed that the argument to format is not validated as I sometimes typed "CSV" instead of "csv".)
Is there a good way to fetch one or multiple properties for a (potentially long) list of QIDs in CSV format?
Here's example code for what I have this far using
wd convert
but would it make sense for it to support--format csv
and fetching more than one property at a time?Or the same using
wd data
: (but does it fetch all item data and would it be more difficult to implement--format csv
?)