Open todrobbins opened 7 years ago
@todrobbins I can only really speak to how the OCD ids are implemented, if that's helpful.
OCDIDField
is a custom Django field from which the id fields on Election
, BallotMeasureContest
and other models all inherit. There's an ocd_type
kwarg for setting the prefix before the UUID.
The UUID itself is randomly generated via Python's builtin uuid.uuid4()
.
python-opencivicdata had all this set up for us before we came along and implemented the election module. The bigger challenge for us was ensuring that our daily ETL process preserves the previously generated ids without inserting duplicate records.
If you're working on something outside the OCD ecosystem, but still in Django, you might consider just using the UUIDField
.
Also, if you're storing your data in postgres, either pgcrypto or uuid-ossp are useful extensions.
Over in django-calaccess-processed-data, we're using pgcrypto's gen_random_uuid()
function to create the OCD ids in bulk, for example, when creating hundreds of thousands of filings in bulk.
Hope that's helpful. If you're looking for more general guidance about assigning ids for data intended for public consumption, I think this is something @fgregg has been researching recently.
I've seen UUIDs within California Civic Data datasets (e.g. https://calaccess.californiacivicdata.org/documentation/processed-files/ballot-measures/) and wondered if there are best practices for ID generation. Thanks!
Examples:
ocd-contest/0ba0ecc5-5fb5-47a4-9750-4d6187b54f29
ocd-election/7c01ac66-b870-4c02-b705-18d83fd7c233