isawnyu / pleiades-gazetteer

This repository provides a home for tickets and other planning documents for the Pleiades gazetteer of ancient places. Code is kept in multiple other repositories.
https://pleiades.stoa.org
11 stars 0 forks source link

place miscount #322

Open paregorios opened 6 years ago

paregorios commented 6 years ago

Number of places in full JSON export (35,335) is significantly less than the number of places reported by the API (35,399) and therefore displayed on the home page. Why?

ryanfb commented 6 years ago

Here's a way to get a list of all place ID's in the CSV export but not in the full JSON export:

jq '.["@graph"] | .[] | .id' pleiades-places-latest.json | tr -d '"' | sort > json-ids.txt
csvcut -c id pleiades-places-latest.csv | tail -n +2 | sort > csv-ids.txt
comm -23 csv-ids.txt json-ids.txt

Here are the results for the latest dump: https://gist.github.com/ryanfb/46a5322cc4f95d7f6d7ad87988474bce

Most or all of these seem to be errata/renamed/duplicates.

paregorios commented 6 years ago

Ah, thanks @ryanfb. I probably need to tweak the API so it only reports published places under /places.

paregorios commented 5 years ago

As a user visiting the Pleiades home page, I will see the the number of published places is the same as the number of published place objects in the /places directory (i.e., excluding errata etc.).

As a computational agent hitting the Pleiades API for the number of places, I will get the count of published places in the /places directory.