ryanhugh / searchneu

Search over Classes, Professors and Employees at NEU!
https://searchneu.com
GNU Affero General Public License v3.0
74 stars 18 forks source link

Elasticsearch #87

Closed dajinchu closed 4 years ago

dajinchu commented 4 years ago

Took forever, but finally got elasticsearch working. The updater.js code in particular is really nasty. It can be optimized significantly with just a little bit of work, but I wanted to leave most of how it worked intact for now.

This is issue #58 and enables a bunch of new features like filters, better fuzzy matching, and perhaps autocomplete + did-you-mean.

I've deployed a working copy here: http://159.89.51.184:5000

TODO:

ryanhugh commented 4 years ago

WOW!! GREAT WORK!! Just looked at the site and it looks great, I'll take a look at the code

ryanhugh commented 4 years ago

Also - take a look at the travis output and fix up the stuff it found

https://travis-ci.org/ryanhugh/searchneu/builds/564749654?utm_source=github_status&utm_medium=notification

You can also run yarn fix to fix and find a bunch of stuff locally.

And merge in the latest master commit too, when you have a chance.

ryanhugh commented 4 years ago

Here's some more stuff too!

Some other stuff:

Can you update this doc too? Different PR would probably be a good idea :) https://github.com/ryanhugh/searchneu/blob/9eb106938f90a4e8ea401343607610e8a6c2bb8d/docs/Search.md

I know I just wrote a lot - but don't let my comments take away from your great work! I've just been working with this codebase a while and have a lot to talk about :)

dajinchu commented 4 years ago
* [ ]  What is esMapping.json ?

It's the elasticsearch index mapping for the classes index. It tells elasticsearch what to do at index time. Basically it's a database schema.

* [ ]  If the user types in a subject like "cs" or "computer science", we want to show all those classes, in order. This query is treated as a "special" query - it isn't ran through elasticlunr at all. Perhaps with elasticsearch, we can run a query for all the matching classes in that subject, and can elasticsearch sort them too?

I think this special query opens a lot of edge cases that may confuse users. For example searching "game" and not finding "COMM 2555: Games for Change" nor "CS 3540: Game Programming" because they aren't in the game subject. I just pushed out something that will take all results that are equally relevant and sort by ascending course number, so without any special query, searching "game" will first show all courses in the game subject, sorted, then show all other courses with "game" in the title. http://159.89.51.184:5000/202010/game

Only problem would be that searching "cs" shows all the labs first, because they have "cs" in the title, making the more relevant.

I think once we add filters, this will all be a non-issue. If a user wants to look at all classes in a subject, they can just pick that filter. If they want to exclude labs, they can do that too!

* [ ]  If we are going to be storing the subjects in RAM anyway, would it be better to do this at query-time, instead of adding another field to the index (right now its done at query time)? Haven't though too much about the technical dets

The less we do at query the better, generally. I think it's preferable we let elasticsearch manage tokenizing words instead of rolling our own solution.