sanger96 / Happenings_Team-5_UTD_Senior_Design_Project

UTD Senior Design Project; Group Members: Gaurav Sanger, Jonathan Lam, Robert Dohm, Landin Kasti, Charles Eaton
3 stars 0 forks source link

Implement PageScraperService.java didPageUpdate() method #52

Closed LKASTI closed 5 months ago

LKASTI commented 5 months ago

The didPageUpdate() returns a boolean indicating if the "The Week" page updated its event list.

IMPORTANT: Originally, it was thought that we could use data from the HTML to check if the webpage was updated. I couldn't find any meta data in the HTML to indicate an updated date. I also found that using the Javascript call document.lastModified() was inaccurate, as it just displayed the time the method was called instead of when the page updated. The page also does not contain a sitemap.xml file to check either.

Unless any other member has found a successful way to check when the page was last updated, or successfully used any of the methods I described above, I decided using caching might be useful to us.

  1. Retrieve the event list from the page using Jsoup.
  2. Compare the event list to the cached event list by iterating through all items in current list. note: the size of the current list and cached list may be different. 2a. If they are different, set the cached list to the new event list 2b. return True
  3. After iterating through all items, return False

This may be inefficient since we would have an O(n) scan over all event page items every time the method is called, but it's the only way I could think of how to implement this. Feel free to discuss changes or thoughts on this in the Discord.

Links: "This Week" Example "Calendar" Page

Location in "This Week" HTML of Events: <u class="lw lw_event_list">

Jsoup: https://jsoup.org/apidocs/

LKASTI commented 5 months ago

After doing more research into caching, I've decided it's best to leave this implementation part for later, as it is an optimization that, while it most certainly would be significant, requires much planning between all back-end members. Therefore, the controller will not call this method for the time being.