sharedstreets / mobility-metrics

Tools for collecting, processing, and interpreting mobility data using SharedStreets
MIT License
50 stars 17 forks source link

trip & status hashing #84

Closed morganherlocker closed 5 years ago

morganherlocker commented 5 years ago

This issue tracks creation of a log of SHA-256 hashes while processing trip and event data. This log is used to create diffs of queries when reports are rerun to verify data integrity without storing any PII on disk. Each hash will incorporate the previous event's hash to verify consistent ordering of data. This data will be used to generate diff reports that will be exposed by the report UI alongside daily summaries.

morganherlocker commented 5 years ago

This is implemented, but is not exposed. Report hashes use hashed metrics, which provides a lot of protection, but less visibility. In the next release, this feature will allow us to describe what changed, rather than just when.

For now, the implementation appends hashes of each status_change and trip in a line delimitted log text file. Here is an example:

df9dca45c8369b5ee79c6fb5748946202aaf99f0e2eecb055a715a8e91f2fcb7
ca7be477d37cf5506a6e3cd473c7c59f24b9e760a6b5dd249979dff6aaeddca1
6bccd16a5c5f4961ecf765470d09cd70dd05c188a321630ddc86d9745dede519
8bd0b16578847e96684a98db15dbb8d2e97bd8478f704e2fef836d9c37f6348a
0d9c872c57aef3135f96b6e150318b10e47215621117ceccce5b4dd13f839cbf
076b9c210c0dc89bbdaddf38034d31f7a6102ab85204b9d3a954abf10ab8a5e9
072571d18fd9593b3ea0b6f099880b9bf856d238dea98a362348dc570406f2d1
39dae62f5b3a6d685414ae0573ff796a0b8153d68c342b7088215c53a4d4546b
dac264f5f7ee458e837a2cc768dc39c2757cb9583a01d332fe3bbbeb182f6768
d2c684618de95bf8e2df5e8fa13ad043ec8501cdde03314fa014918547440b88
c9383f224858072748826aa5a3fc6d0ce91cf75a9e5066fd506ccd2b1809e2aa
8c281887fd88c3dd8ddd8ed61736018bc7bce017412e5079eb59408bf4f17e9d
038fe38ebbc921c71085573f40e6d4fcf53a9c0a84e339b56fecdf8133e2f942
da06510163f9e9a726ce65727a3650004e1d9343338d667ee6a6998511b91f62
4f3e18e0bb5efe101d9aec2bc847ddc82f87fb5f0b880c9674665d3bfc1bcd66
81e9a6353bf252646be4e3fb0d90366f63414c5e4bfca47b523c09dcfe0bb48a
d2e2fc3f9838cc15cea0ada171ecf77b3dee5879682226024ff01561b887bdb9
6002af9cc4edd8c2c87630852b353a0ff768d7f17034c75c578d16993c24baeb
48b0189107b4ab618cc2769efd707d8d959e82e69c4001768ddcdd24537671c7

Diffing all lines between two days will describe how many events changed, were added, or were missing and we will expose this information on the diff report pages.