topcoderinc / va-data-scraper

Pulls Veteran Data from data.gove for
1 stars 1 forks source link

Data sync #3

Open gondzo opened 6 years ago

gondzo commented 6 years ago

Since we need to support data sync with every new release of the data sets, we have some issues in tracking veteran records - there is no ID field for veterans or cemeteries, so we are using a combination of fields as an ID. right now those fields are {firstName-lastName-birthDate-deathDate-cemeteryID} as veteran ID, with Cemetery ID: {cemeteryName-address-zip}.

This ensures we populate the database only with complete data. If we don't use birth/death date or cemetery id as keys, we get a lot of duplicate records. On the other hand, this does mean we will miss a lot of records that have partial data. Is there anything that can be done about the data source (like having record ids)? @kbowerma

kbowerma commented 6 years ago

Yeah, I think we need to figure out a way to create a key from the row, to ensure uniqueness as we insert. The problem with that is that if a record is edited from the source, it will be a new key.