Open adesca opened 5 years ago
Status indicators
Tasks done in e2e:
show errors
on errored transformationsAUGMENTATION:
Table of data_allegationcategory
Id category_name 1 category1 2 category2 3 category3
Table data_allegation (pre-augment)
cr_id … current_category 123123 category2 123124 category3 123125 category1
augment() for each row in data_allegation look up the id of the category listed under current_category replace value of current_category with looked up id
Table data_allegation (post augment)
cr_id … current_category 123123 2 123124 3 123125 1
Overarching goal: A user should be able to trigger a process in the server that pulls data from the COPA website and imports new Allegations to the database.
Things to keep in mind:
Goals:
current category
column with a reference to the data_allegationcategory table for that particular categoryThe business need:
From Rajiv: The primary purpose of this COPA Data Portal data capture step is to create incomplete/phantom complaint records in our database (for new complaints since our last successful FOIA response) so that we can have some matching data for the new documents that are being picked up by our crawlers/scrapers ( https://cpdp.co/crawlers and https:// cpdp.co/documents ).
The second purpose is to compare against the data that we have received via FOIA responses to whether we are missing any records (i.e., were any responsive complaint records omitted from our original dataset and if so which ones).
The third purpose is to compare different versions/snapshots of it over time and see what’s changing (is it just new records being added on to the end, or are older records being added, or removed, or altered).
From Basecamp: The Civilian Office of Police Accountability (COPA) has just posted a new live data feed to the City's Open Data Portal that goes back 10 years. Here are a few early questions to investigate.