This code is going to be used to scrape the UCPD Daily Incident page at a pre-determined frequency and store the incidents on a generic JSON data-store.
Added functionality to lemmatize the incident types.
Checklist before requesting a review
[x] The code runs successfully.
(ucpd-incident-scraper-py3.11) michaelp@MacBook-Air-18 ucpd-incident-scraper % make lemmatize-categories
python -m incident_scraper lemmatize-categories
[nltk_data] Downloading package wordnet to ./data...
[nltk_data] Package wordnet is already up-to-date!
16707 incidents fetched.
...
Incident type changed from Irregular Conditions to Irregular Condition.
358 of 16707 were incidents lemmatized.
358 types were updated.
Program shutting down, attempting to send 297 queued log entries to Cloud Logging...
Waiting up to 5 seconds.
Sent all pending logs.
(ucpd-incident-scraper-py3.11) michaelp@MacBook-Air-18 ucpd-incident-scraper % make three_days
python -m incident_scraper days-back 3
[nltk_data] Downloading package wordnet to ./data...
[nltk_data] Package wordnet is already up-to-date!
Beginning the UCPD Incident scraping process.
Finished with the UCPD Incident scraping process.
23 total incidents were scraped from the UCPD Incidents' site.
API queries_quota: 60
This incident has an insufficient number of keys: {}
Incident type changed from Deceptive Practice / Fraudulent Checks to Deceptive Practice / Fraudulent Check.
1 of 23 contained malformed or voided information.
0 of 23 could not be processed by the GoogleMaps' Geocoder.
22 of 23 incidents were successfully processed.
Adding 22 of 23 incidents to the GCP Datastore.
Completed adding 22 of 23 incidents to the GCP Datastore.
1 of 4 'Information' incidents predicted into other categories.
1 of 23 incidents could NOT be added to the GCP Datastore.
Program shutting down, attempting to send 2 queued log entries to Cloud Logging...
Waiting up to 5 seconds.
Sent all pending logs.
(ucpd-incident-scraper-py3.11) michaelp@MacBook-Air-18 ucpd-incident-scraper %
[x] Model update:
(ucpd-incident-scraper-py3.11) michaelp@MacBook-Air-18 ucpd-incident-scraper % make build_model
python -m incident_scraper download
[nltk_data] Downloading package wordnet to ./data...
[nltk_data] Package wordnet is already up-to-date!
Downloaded 16714 incident records.
Saved 16714 incident records to a CSV.
Program shutting down, attempting to send 1 queued log entries to Cloud Logging...
Waiting up to 5 seconds.
Sent all pending logs.
python -m incident_scraper build-model
[nltk_data] Downloading package wordnet to ./data...
[nltk_data] Package wordnet is already up-to-date!
Accuracy Score: 0.7209085831316329
Precision Score: 0.8889086701085989
Recall Score: 0.7821115288220551
Program shutting down, attempting to send 2 queued log entries to Cloud Logging...
Waiting up to 5 seconds.
Sent all pending logs.
(ucpd-incident-scraper-py3.11) michaelp@MacBook-Air-18 ucpd-incident-scraper %
Describe your changes
Added functionality to lemmatize the incident types.
Checklist before requesting a review