hackgvl / hackgreenville-com

HackGreenville's Website
https://hackgreenville.com
MIT License
16 stars 15 forks source link

Prune Missing Events #251

Closed bogdankharchenko closed 3 weeks ago

bogdankharchenko commented 1 month ago

This sets up a daily command to check all events to ensure the page returns a successful response - if it does not - it deletes the event from our database.

Resolves: https://github.com/hackgvl/hackgreenville-com/issues/238

allella commented 1 month ago

@bogdankharchenko

With the added requests, the main concern is if we are likely going to trigger Meetup's REST API throttle unless we're playing by their rules.

Meetup's Throttling

The archived docs from Wayback Machine show the throttling, which appears to be about 30 with a 10 second reset, which I assume means we'd want to average less than 3 requests per second, if I'm understanding that correctly.

image

We triggered Meetup's IP throttle recently with stage and live running hourly, so I switched stage to less frequency to avoid the throttle triggering all the time.

Eventbrite

They show their limit in the HTTP response headers and it's 2,000 per hour, so that's less of a concern.

Avoiding Running Cron Jobs at the Same Time

Also, I think we talked about being able to configure the importer through .env but it was dismissed. It may be a good idea to allow some of these tasks to be configured through .env so we're not running live and stage at the same time, and so we can run things less frequently on stage. I'm currently doing this by tweaking stage's crontab, but it's less granular than if we used the task timing configuration in a dynamic way.

bogdankharchenko commented 1 month ago

@allella we should use separate API keys for staging or turn off the scheduler.

allella commented 1 month ago

@bogdankharchenko the Meetup REST API does not use API keys. It previously did, but once they sunset the API we moved to hitting the REST API without a key. Their throttling is IP address based, so it's just something we'll have to keep in mind.

Given that we've already hit the limit, it may be wise to add in some throttling, like maybe add a second delay between each request to see if that helps.

If we moved to the new GraphQL API it also has limits and it's unclear if we'd be able to get more than one API key, since it requires a Pro account and only the admin of that account seems to be able to apply for an OAuth key for GraphQL.

Adding a one second delay to the importer and purge tasks may be the simplest way to avoid throttles.

allella commented 1 month ago

@bogdankharchenko or a better idea, perhaps make the throttle seconds delay configurable through .env so we can easily experiment with it and set different throttle times on prod vs stage.

This could be in addition to making the cron jobs more configurable, but I think that idea is less necessary if we are able to configure throttling.