Data wrangling - Githubissues

npscience commented 6 years ago

We have datasets arranged in folders by month: e.g. 2017_01. Each folder contains four .csvs:

Crime reports e.g. '2017-01-cambridgeshire-street.csv'
Outcomes e.g. '2017-01-cambridgeshire-outcomes.csv' For both Cambridgeshire and City of London constabularies

In crime reports, the columns of interest are:

Focus on Cambs data for now.

Checks:

[x] check longitude, latitude map to sensible locations (Location is another column) - yes, so we can just use long,lat.
[x] are there differences between "Last outcome category" in 'crime reports' and "Outcome" in 'Outcomes'? If not, we don't need to use 'outcomes' - only four, decided not to use ...outcomes.csv

Wrangling:

npscience commented 6 years ago

@raspicer I will check whether the outcomes data is valuable to us (see check mark 2 above)

npscience commented 6 years ago

For outcomes:

There are four "bicycle theft" crime IDs with different status between the 'crime data' and the 'outcomes': Crime ID	Last outcome category (in...streets.csv)	Outcome type (in ...outcomes.csv)
636ceaae36b00bb700676f99e17fed443f49835f116c2b2ff9d6cbcae9903143	Court result unavailable	Investigation complete; no suspect identified
583c8b5ff7906af8d265259d77167fabc712addd7bc842959701e2aa08f4115c	Defendant found not guilty	Investigation complete; no suspect identified
8dda5d87822619923b3abb206f843819cad28b3814ab808fa952373737b378e8	Awaiting court outcome	Investigation complete; no suspect identified
4da3f701da29313e1d692f81eb243e75a2c67824f8c8fc4cc548ce596440a868	Unable to prosecute suspect	Investigation complete; no suspect identified

The ...streets.csv often contains an outcome status, where ...outcomes.csv does not.

@RASpicer Would you say it's worth just using the base crime data (...streets.csv) for this project?

RASpicer commented 6 years ago

Extracted all of the bike theft data into a single csv file.

open-data-day-cambridge-2018 / bicycle-theft-cambridgeshire