lshtm-gis / WHO_PHSM_Cleaning

Cleaning PHSM provider data for WHO
https://lshtm-gis.github.io/WHO_PHSM_Cleaning/html/
MIT License
0 stars 1 forks source link

Row hashes on ingestion #123

Closed hamishgibbs closed 3 years ago

hamishgibbs commented 3 years ago

Filtering already-processed records should be accomplished with row hashes which persist between cleaning runs.

Output row-wise hashes in each ingestion with date of processing to config folder.

To filter recognised records - check hashes for all records with a date != current date.

This will remove the dependence on prop_ids and will force reprocessing on dataset changes - both improvements of the current routine.