Closed dajinchu closed 4 years ago
WOW!! GREAT WORK!! Just looked at the site and it looks great, I'll take a look at the code
Also - take a look at the travis output and fix up the stuff it found
You can also run yarn fix
to fix and find a bunch of stuff locally.
And merge in the latest master commit too, when you have a chance.
Here's some more stuff too!
[ ] I think it might be a good idea to store each subject for each host+termId in RAM in the backend - so we can quickly compare for the following two features:
[ ] If we are going to be storing the subjects in RAM anyway, would it be better to do this at query-time, instead of adding another field to the index (right now its done at query time)? Haven't though too much about the technical dets https://github.com/sandboxneu/searchneu/blob/71b153a89df87c15ba8e4f0da2893eac6c630171/backend/scrapers/classes/searchIndex.js#L21-L25
Some other stuff:
Can you update this doc too? Different PR would probably be a good idea :) https://github.com/ryanhugh/searchneu/blob/9eb106938f90a4e8ea401343607610e8a6c2bb8d/docs/Search.md
I know I just wrote a lot - but don't let my comments take away from your great work! I've just been working with this codebase a while and have a lot to talk about :)
* [ ] What is esMapping.json ?
It's the elasticsearch index mapping for the classes index. It tells elasticsearch what to do at index time. Basically it's a database schema.
* [ ] If the user types in a subject like "cs" or "computer science", we want to show all those classes, in order. This query is treated as a "special" query - it isn't ran through elasticlunr at all. Perhaps with elasticsearch, we can run a query for all the matching classes in that subject, and can elasticsearch sort them too?
I think this special query opens a lot of edge cases that may confuse users. For example searching "game" and not finding "COMM 2555: Games for Change" nor "CS 3540: Game Programming" because they aren't in the game subject. I just pushed out something that will take all results that are equally relevant and sort by ascending course number, so without any special query, searching "game" will first show all courses in the game subject, sorted, then show all other courses with "game" in the title. http://159.89.51.184:5000/202010/game
Only problem would be that searching "cs" shows all the labs first, because they have "cs" in the title, making the more relevant.
I think once we add filters, this will all be a non-issue. If a user wants to look at all classes in a subject, they can just pick that filter. If they want to exclude labs, they can do that too!
* [ ] If we are going to be storing the subjects in RAM anyway, would it be better to do this at query-time, instead of adding another field to the index (right now its done at query time)? Haven't though too much about the technical dets
The less we do at query the better, generally. I think it's preferable we let elasticsearch manage tokenizing words instead of rolling our own solution.
Took forever, but finally got elasticsearch working. The updater.js code in particular is really nasty. It can be optimized significantly with just a little bit of work, but I wanted to leave most of how it worked intact for now.
This is issue #58 and enables a bunch of new features like filters, better fuzzy matching, and perhaps autocomplete + did-you-mean.
I've deployed a working copy here: http://159.89.51.184:5000
TODO:
[ ] Install elasticsearch on production. Ideally we would set it up on a second server, have a load balancer, and switch when ready to achieve zero downtime.
[ ] Modify the scrape cron job on Travis to push scraped data into the elasticsearch index properly
[x] Optimize some of the calls to elasticsearch