openjusticeok / ojoregex

A seperate package for maintaining the regex patterns that we use in our data normalization pipeline.
https://openjusticeok.github.io/ojoregex/
GNU General Public License v3.0
0 stars 0 forks source link

Property crimes #9

Closed andrewjbe closed 6 months ago

andrewjbe commented 6 months ago

Started working on this to answer Cole's question on shoplifting rates. This is all to counter the whole "shoplifting / larceny crimes are skyrocketing because of SQ780!!!" thing.

Right now a lot of specific types of larceny are being grouped into "Larceny (Unspecified)", including "larceny of a vehicle", "larceny of domestic animals", etc. That can maybe be fleshed out more; how into the weeds to we want to go? Do we need specific variants for "aiding / abetting", "attempted", etc.?

andrewjbe commented 6 months ago

Check inst/cole-larceny-request.R for summary showing how things got classified. The regex itself is in the google sheet still, and the application of the flags happens in apply_regex.R. That function might not work right now because I'm just hooking it directly up to the Google Sheet via my own login for development purposes, so you might need to change the email in there.

andrewjbe commented 6 months ago

Additional note: Cole asked if we could also look at repeat offenders / unique defendants here.

brancengregory commented 6 months ago

So far I focussed on comparison between yours and Ashley's regex, keeping in mind that hers excludes larceny of CDS since that is covered under Title 63 rather than Title 21 where most larceny/theft is covered, and we were narrowly focused. So I don't expect outcomes to be equivalent, but disparities between whether you classify something as larceny and whether she does is still helpful to identify errors/missing charges.

brancengregory commented 6 months ago

Including a keyword for 'theft' includes 'identity theft', so I created a regex for identity in the spreadsheet and exclude those. Similarly, we may need to do the same for others. One to check is theft of a credit card. We need to verify that this would fall under a type of larceny.

Run the block on line 118 to see other charges you may have missed.

brancengregory commented 6 months ago

https://docs.google.com/document/d/19K6KrBgljBP_E9TnEYL6hsD2e0GrNOVNuNZsuo5fhFY/edit#heading=h.eobk14ygvpvm

andrewjbe commented 6 months ago

I added flags for "enter / entering", "false report", "credit / debit card", "intent", and "copper" so that we could make sure that anything like "entering w/ intent to steal copper" would get included, but stuff like identity theft, stealing a credit card, etc. would be excluded. I think the results look pretty good now.