open-data-day-cambridge-2018 / bicycle-theft-cambridgeshire

MIT License
2 stars 1 forks source link

Data wrangling #1

Closed npscience closed 6 years ago

npscience commented 6 years ago

We have datasets arranged in folders by month: e.g. 2017_01. Each folder contains four .csvs:

In crime reports, the columns of interest are:

Focus on Cambs data for now.

Checks:

Wrangling:

npscience commented 6 years ago

@raspicer I will check whether the outcomes data is valuable to us (see check mark 2 above)

npscience commented 6 years ago

For outcomes:

There are four "bicycle theft" crime IDs with different status between the 'crime data' and the 'outcomes': Crime ID Last outcome category (in...streets.csv) Outcome type (in ...outcomes.csv)
636ceaae36b00bb700676f99e17fed443f49835f116c2b2ff9d6cbcae9903143 Court result unavailable Investigation complete; no suspect identified
583c8b5ff7906af8d265259d77167fabc712addd7bc842959701e2aa08f4115c Defendant found not guilty Investigation complete; no suspect identified
8dda5d87822619923b3abb206f843819cad28b3814ab808fa952373737b378e8 Awaiting court outcome Investigation complete; no suspect identified
4da3f701da29313e1d692f81eb243e75a2c67824f8c8fc4cc548ce596440a868 Unable to prosecute suspect Investigation complete; no suspect identified

The ...streets.csv often contains an outcome status, where ...outcomes.csv does not.

Data

@RASpicer Would you say it's worth just using the base crime data (...streets.csv) for this project?

RASpicer commented 6 years ago

Extracted all of the bike theft data into a single csv file.