michplunkett / ucpd-incident-scraper

This code is going to be used to scrape the UCPD Daily Incident page at a pre-determined frequency and store the incidents on a generic JSON data-store.
MIT License
3 stars 2 forks source link

Lemmatize incident type #41

Closed michplunkett closed 9 months ago

michplunkett commented 9 months ago

Describe your changes

Added functionality to lemmatize the incident types.

Checklist before requesting a review

(ucpd-incident-scraper-py3.11) michaelp@MacBook-Air-18 ucpd-incident-scraper % make lemmatize-categories
python -m incident_scraper lemmatize-categories
[nltk_data] Downloading package wordnet to ./data...
[nltk_data]   Package wordnet is already up-to-date!
16707 incidents fetched.
...
Incident type changed from Irregular Conditions to Irregular Condition.
358 of 16707 were incidents lemmatized.
358 types were updated.
Program shutting down, attempting to send 297 queued log entries to Cloud Logging...
Waiting up to 5 seconds.
Sent all pending logs.
(ucpd-incident-scraper-py3.11) michaelp@MacBook-Air-18 ucpd-incident-scraper % make three_days 
python -m incident_scraper days-back 3
[nltk_data] Downloading package wordnet to ./data...
[nltk_data]   Package wordnet is already up-to-date!
Beginning the UCPD Incident scraping process.
Finished with the UCPD Incident scraping process.
23 total incidents were scraped from the UCPD Incidents' site.
API queries_quota: 60
This incident has an insufficient number of keys: {}
Incident type changed from Deceptive Practice / Fraudulent Checks to Deceptive Practice / Fraudulent Check.
1 of 23 contained malformed or voided information.
0 of 23 could not be processed by the GoogleMaps' Geocoder.
22 of 23 incidents were successfully processed.
Adding 22 of 23 incidents to the GCP Datastore.
Completed adding 22 of 23 incidents to the GCP Datastore.
1 of 4 'Information' incidents predicted into other categories.
1 of 23 incidents could NOT be added to the GCP Datastore.
Program shutting down, attempting to send 2 queued log entries to Cloud Logging...
Waiting up to 5 seconds.
Sent all pending logs.
(ucpd-incident-scraper-py3.11) michaelp@MacBook-Air-18 ucpd-incident-scraper %