ecxyzzy commented 1 year ago

Summary

Switch to using our Amazon RDS (managed relational database) instance for the WebSoc cache. This will allow us to cache all of WebSoc and serve arbitrary requests with it, rather than the limited set of queries we currently support.

Special thanks to @MinhxNguyen7 for helping with brainstorming and testing for this feature.

TODO:

[x] Add full courses and cancelled courses filtering
[x] Incorporate changes from #20
[x] Add logic for scraper
[x] ~~Determine optimal solution for hosting scraper (EC2)~~
[x] ~~Rewrite scraper~~
[x] ~~Determine code deployment process for that solution~~
[x] ~~Implement code deployment process~~
[x] #26

Issues

Closes #11.

Future Followup

The scraper can serve as a basis for getting enrollment data for that endpoint.

ecxyzzy commented 1 year ago

@bevm0 Blazingly fast :rocket:

Filtering seems to be mildly broken still and I still need to actually combine the scraping and processing logic into a single coherent script and I need to find some way to deploy this...

MinhxNguyen7 commented 1 year ago

How much faster is that?

ecxyzzy commented 1 year ago

@MinhxNguyen7 this is approximately on par with the DynamoDB-backed partial cache (and anywhere from 2-10× faster than querying WebSoc directly), with and without considering Lambda cold start times. However, we can now cache everything, which should lower the expected response time overall. IMO this is strictly an improvement over the old solution.

ecxyzzy commented 1 year ago

Scraper is up and running, and the cache should be fully repopulated in around 30 minutes.

icssc / peterportal-api-next

feat: implement WebSoc scraping and RDS-backed endpoint #21

Summary

Issues

Future Followup