Open mattmathis opened 4 years ago
Highest priority piece: distinguish between data derived from real-time annotator and etl annotator
Better phrasing: We need to archive the date (or version) of raw annotation DB databases (Maxmind, etc) independent of the date of the row. This is necessary to study the stability of address ownership and assignments.
It must be possible to distinguish between new annotations applied to old unannotated data in the etl pipeline and annotations circa data collection.
Consider making the annotation include: the SHA of the maxmind DB, and the timestamp when it was applied to the data. (eg. approximately measurement time or approximately parse time).