Update database schema to ensure uniqueness of each (study room ID, start, end) tuple. This resolves the issue of that table ballooning to many millions of rows.
Changes to the WebSoc scraper:
Remove the DEFAULT NOW directive for updated_at columns to ensure it will be updated by the scraper on an upsert.
Add a column to the websoc_meta table that tracks the last department that was scraped successfully. If this column is not null for any term eligible to be scraped, the scraper will prioritize that term and start where the scraper left off.
Misc fixes:
Explicitly close the connection upon scraper/migration termination to avoid possible deadlocks.
Related Issue
Closes #4.
How Has This Been Tested?
For the study location scraper:
Run scraper once locally.
Run SELECT COUNT(1) FROM study_room_slot on local dev db.
Run scraper again.
Run the above query again and verify that the number is the same, or close to the same. It may differ if run before/after the half-hour mark, since that's usually when additional availability is revealed.
For the WebSoc scraper:
Run scraper once locally.
Ctrl-C it before it finishes a full scrape.
Run scraper again and verify that it picks up where you stopped it last.
Note the value of the updated_at column for an arbitrary row belonging to the term.
Run scraper again.
Verify that the updated_at column was updated properly.
Types of changes
[x] Bug fix (non-breaking change which fixes an issue)
[ ] New feature (non-breaking change which adds functionality)
[ ] Breaking change (fix or feature that would cause existing functionality to change)
Checklist:
[x] My code involves a change to the database schema.
[ ] My code requires a change to the documentation.
Description
Changes to the study room scraper:
Changes to the WebSoc scraper:
DEFAULT NOW
directive forupdated_at
columns to ensure it will be updated by the scraper on an upsert.websoc_meta
table that tracks the last department that was scraped successfully. If this column is not null for any term eligible to be scraped, the scraper will prioritize that term and start where the scraper left off.Misc fixes:
Related Issue
Closes #4.
How Has This Been Tested?
For the study location scraper:
SELECT COUNT(1) FROM study_room_slot
on local dev db.For the WebSoc scraper:
updated_at
column for an arbitrary row belonging to the term.updated_at
column was updated properly.Types of changes
Checklist: