samyok / gophergrades

Used by more than 35k students to view all the past grades for classes taken at the University of Minnesota, Twin Cities.
https://umn.lol
21 stars 13 forks source link

Speed up scraping with concurrent requests #61

Closed superstealthysheep closed 1 year ago

superstealthysheep commented 1 year ago

Currently, when we fetch the calendar for every week in the semester, every single one of those requests begins only after the previous one has completed. Similarly, when we grab general course data for each course we're enrolled in, those also happen one after the other.

There should be a (not too hard) way to make all of these requests send concurrently. To generate the calendar for one semester, we generally need to make 1 sample week+7 courseinfos (may vary depending on schedule)+15 individual weeks=23ish fetch requests. If we find a way to turn these 23 sequential requests instead into 3 batches of concurrent requests, that would likely shorten scraping time down to one eighth (~3/23) of the current time.

I think there's something we can do with Promise.all(). I'll do some reading

superstealthysheep commented 1 year ago

Okay, so I implemented this (commit fe065cd), but for some reason it gives only a very trivial speedup (if at all).

Before concurrency: before-concurrency After concurrency: after-concurrency

Now the question is why these new, concurrent requests are taking so long to complete. @doggu and I have a hypothesis: it's MyU's fault; they seem to only be able to handle one request at a time over on their end. Is this so? How would we know? I'm still holding out hope that there's a way we can nontrivially speed up these fetch requests.

Kanishk-K commented 1 year ago

Going to close this as this feature seems to be added to the recent Spring 2023 feature additions.