icssc / peterportal-api-next

API that provides easy access to public data from UC Irvine. Developed for Anteaters, by Anteaters.
https://docs.icssc.club/anteaterapi
MIT License
6 stars 0 forks source link

Implement fuzzy search-as-a-service for courses and instructors #131

Open ecxyzzy opened 7 months ago

ecxyzzy commented 7 months ago

Current State of Fuzzy Search

It works and seems reasonably fast somehow, but is a bit of a cobbled together mess. This is of course not entirely surprising considering it was hacked together within the span of a week. There are some weird edge cases that have come up because of how it actually works under the hood, and at this point I'm not entirely sure at first glance how to address them since the code is so poorly documented (thanks to me from two years ago).

Question

Fuzzy search will definitely be reworked—that is not really up for debate. The main decision to be made is whether to continue offering it as an npm package, or creating an API endpoint that encapsulates the functionality.

I'm also open to any additional solutions that are none of the above or incorporate elements of both.

Analysis of SaaS

Pros

Cons

Stakeholders

@EricPedley @ap0nia @js0mmer

js0mmer commented 7 months ago

After merging https://github.com/icssc/peterportal-client/pull/297 (search pagination), I think PeterPortal specifically could benefit from an API route tailored to our use case.

Current implementation of our search makes a fuzzy search limiting results to the first 5 pages (50 total, 10 per page). Then it makes a GraphQL request to the API requesting data for the 10 courses on the current page. Each time the page is changed, another API request is sent for the courses on that page.

If the user moves on to page 4+, an additional fuzzy search is made without a limit to get all results (for an empty query this is about ~6000 courses). More results are needed once page 4+ is reached since we need to know how many pages remain to know what page number buttons to render below. This unbounded search is pretty slow, hence why we limit to 50 initially (site would freeze when loading otherwise). A button to the last page is also omitted in our current implementation because we don't know what the last page number initially given that we limit to 50 results.

image

image

I think we could benefit from an API route where we have parameters similar to the current fuzzy search (query, filters, number of results, etc.) and a new parameter for offset (so we can get a range of results from [offset, offset + number of results)). In response, we would get the full course/instructor data for that range rather than just metadata and the total number of results that match the query so we can determine how many pages of results there are.

Lastly, I think the current fuzzy search being called "fuzzy" is a bit of stretch lol. It seems to only match substrings and won't find results if there's a typo/a single letter missing.

EricPedley commented 6 months ago

For AA I kinda am fine leaving it janky and never touching it, but if PeterPortal switches solutions then I think it'd be pretty easy for us to just copy paste real quick