In order to test the ballotapi server, we need to create a test data set that can be loaded into the database.
Brainstorm on data repo structure
We want people to be able to browse, comment, fork, branch, and open pull requests on our core ballot data set, so I think we need to treat it as a git repo. Also, for the data/ folder, I want people to easily find the big elections, so I think we should split elections up into federal, state, and local levels for easier browsing.
/ballotapi-data/README.md (overview for the overall data repository)
/ballotapi-data/CONTRIBUTING.md (instructions on how to contribute)
/ballotapi-data/LICENSE (public domain license)
/ballotapi-data/tests/*.py (tests for sanity checking the database)
/ballotapi-data/data/ (folder for the actual data, split by national/state/local elections)
/ballotapi-data/data/national_elections/2019-01-01_primary/README.md (notes about the particular election)
/ballotapi-data/data/national_elections/2019-01-01_primary/election.yaml (the election object)
/ballotapi-data/data/national_elections/2019-01-01_primary/precincts/*.yaml (precinct objects)
/ballotapi-data/data/national_elections/2019-01-01_primary/contests/*.yaml (contest objects)
/ballotapi-data/data/state_elections/* (same structure as national elections)
/ballotapi-data/data/local_elections/* (same structure as national elections)
Thoughts on using YAML
I'm leaning towards using yaml instead of json for defining objects in the authoritative data set because its so much more flexible, including the ability to have comments. I want for people who are fixing edge cases to be able to add comments and annotations beyond what is shown on the API, so that others can easily see later why a data point is what it is (in addition to the git log information).
In order to test the ballotapi server, we need to create a test data set that can be loaded into the database.
Brainstorm on data repo structure
We want people to be able to browse, comment, fork, branch, and open pull requests on our core ballot data set, so I think we need to treat it as a git repo. Also, for the
data/
folder, I want people to easily find the big elections, so I think we should split elections up into federal, state, and local levels for easier browsing.Thoughts on using YAML
I'm leaning towards using yaml instead of json for defining objects in the authoritative data set because its so much more flexible, including the ability to have comments. I want for people who are fixing edge cases to be able to add comments and annotations beyond what is shown on the API, so that others can easily see later why a data point is what it is (in addition to the git log information).