Open ajinnah opened 2 years ago
It's not implemented but could be done fairly easily with grep (which will be much faster, see documentation on prefiltering):
# Create a file with one id per line, matching dump lines start
echo "Q1
Q2
Q3" | awk '{print "^{\"type\":\"item\",\"id\":\"" $1 "\","}' > qid_filter
# Filter the dump with that shortlist of ids
cat latest-all.json.gz | gzip -d | grep -E -f qid_filter | sed 's/,$//' > selected_entities.ndjson
Hello,
I have a large list of wikidata id's or Q Numbers and I'd like to filter out purely these entities. Does this already exist/is this possible to implement?
Thank you!