hnl-ai / hpdstats

A data tracker on the arrests of the Honolulu Police Department
https://hpdstats.com
3 stars 0 forks source link

Unaligned Dates #6

Closed tyliec closed 2 years ago

tyliec commented 3 years ago

Context

Currently, we deal with three types of dates in the Arrest Log Reports.

  1. The date of the arrest
  2. The date the arrest log was published
  3. The date our script scraped the arrest log

We currently use 2 as the source of truth for where we get our dates (we store this in the database), but this is not always accurate. For example, there might be an arrest that happened on 7-12-2021 that is included in the report published on 7-13-2021. This may be due to processing time or if the report for the day had already been published, the rest of the arrests of that day are just rolled into the next.

Potential Solutions

  1. Parse the actual arrest date from the PDF
tyliec commented 3 years ago

This issue is also the root cause of a duplicate record problem - with multiple records overlapping over the course of a few days.