ryanhugh / searchneu

Search over Classes, Professors and Employees at NEU!
https://searchneu.com
GNU Affero General Public License v3.0
74 stars 18 forks source link

Show diff(/graph) of open seats #50

Open CodeLenny opened 6 years ago

CodeLenny commented 6 years ago

Feature suggestion! It would be nice to see how seats are changing over time for each courses. A graph would be perfect, but given your current data storage techniques (don't get me started, Hughes) I doubt you would keep a database up to date.

Instead, how about showing a few days history, and/or how many days the availability has remained stable?

E.g. 4/30 seats (- 2 yesterday). or 10/50 seats (no changes over the last 8 days)

edward-shen commented 6 years ago

given your current data storage techniques

hahahaha


In all honesty though, we could likely implement a short term naive implementation by storing the previously fetched data, and iterating over all the classes to find any changes, then discarding the old data.

In all honesty though, I'd rather have something that courseoff has, with the live fetch instead.

CodeLenny commented 6 years ago

It's actually the courseoff live fetch that made me want to see the history - I'd still like to know if I have to rush to do final course rearranging for Spring or if I've got a day or two to wait :P

edward-shen commented 6 years ago

Doesn't look like they're open source, unforuntately.. I would love to find how how they grab it, and if they do it client or server side.

edward-shen commented 6 years ago

If we were to implement live updating, we'd have a few options:

  1. Have the user browser keep track (cookies/localstorage, perhaps) of which ones were recently polled, so they don't spam requests, and then associate a (refresh) button that effectively runs a scrape client side.
  2. Have the client send a section request upon loading the desired search to our servers. We'd likely need a timeout (e.g. 5 seconds or something) to make sure that the find the right page (e.g. cs28 vs cs2800). The server would then cache results and update them if it's not recent enough.
  3. Be gross and just send a request to courseoff's servers, and get that data instead.
ryanhugh commented 6 years ago

Definitely like both of these ideas! Currently all the data that we have can also be found on NEU's servers. In other words, if we ever lose the data, or for whatever reason want to delete all of our data, we can just delete the stuff we currently have and scrape everything again and we're good to go. If we wanted to keep track of old data we would have to keep track of data that we can't be found anywhere else.

The courseoff-like live fetch is totally something that we could do and it actually going to be really easy to add once #47 gets merged! I'm making some good progress in that PR too. We could also make it so you can mark a class as interesting (or something) and that one class would be updated every 5 minutes, along with all the classes that people are watching for notifications when seats open up.

ryanhugh commented 6 years ago

Yeah.... Courseoff isn't open source unfortunately. :/ It sounds like the main guy behind it (Roman Shtylman) has just been doing maintenance updates over the last couple years and that's pretty much it. I emailed him a couple years ago asking if he wanted to go open source and didn't really sound like he was interested. That was a while ago though.

ryanhugh commented 6 years ago

Roman is also scraping the data from Northeastern's site (http://shtylman.com/post/scraping-broken-ssl-pages-with-node-js/) and I am pretty certain he is scraping all the data from his backend for the updating feature too. If you tried to scrape from the frontend/client you would run into cross-site ajax issues because Northeastern's site doesn't respond with a Access-Control-Allow-Origin header.

ryanhugh commented 6 years ago

For the updating when seats open up in the facebook branch, I am planning on having an interval run that will do a couple things once every 5 minutes:

  1. Pull a list of all the classes that people are watching
  2. Scrape those classes.
    • Here, we are only going to scrape the classes that people are watching, so we are only going to hit about 10-100 (max) classes instead of all 7,000. This should be pretty fast.
  3. Send notifications if seats opened up in classes that anyone is watching.
  4. Update the data stored in RAM in the backend!

After this runs, if you search for one of the classes that have been updated, it will return the data that was fetched < 5 min ago instead of the last time all the scrapers ran!

ryanhugh commented 6 years ago

I'l add some stuff to that PR on how I'm thinking everything is going to work haha