Open paregorios opened 6 years ago
Here's a way to get a list of all place ID's in the CSV export but not in the full JSON export:
jq '.["@graph"] | .[] | .id' pleiades-places-latest.json | tr -d '"' | sort > json-ids.txt
csvcut -c id pleiades-places-latest.csv | tail -n +2 | sort > csv-ids.txt
comm -23 csv-ids.txt json-ids.txt
Here are the results for the latest dump: https://gist.github.com/ryanfb/46a5322cc4f95d7f6d7ad87988474bce
Most or all of these seem to be errata/renamed/duplicates.
Ah, thanks @ryanfb. I probably need to tweak the API so it only reports published places under /places
.
As a user visiting the Pleiades home page, I will see the the number of published places is the same as the number of published place objects in the /places
directory (i.e., excluding errata etc.).
As a computational agent hitting the Pleiades API for the number of places, I will get the count of published places in the /places
directory.
Number of places in full JSON export (35,335) is significantly less than the number of places reported by the API (35,399) and therefore displayed on the home page. Why?