weixsong / elasticlunr.js

Based on lunr.js, but more flexible and customized.
http://elasticlunr.com
MIT License
2.02k stars 147 forks source link

Modernizing code & build #102

Open srenauld opened 5 years ago

srenauld commented 5 years ago

As per one of the to-dos, I've started working on modernizing both the code and the build pipeline.

Right now, one of the main drawbacks for contributors (such as myself - one of the issues and PRs I put forward required a partial rewrite due to this) is that the build pipeline relies extensively on make, and is effectively a concatenation of all javascript files present in lib/. This does not provide opportunities to use a modern toolkit (such as webpack) to properly "build", nor a testing library like mocha to operate on a non-built version.

The plan of battle, with current status, is as follows:

My working branch is over at code-improvements on my fork. Once we're back in a stable state, I'll put a PR forward.

weixsong commented 5 years ago

@srenauld , thanks very much for the contribution.

srenauld commented 5 years ago

Status update time!

Turns out there were a lot more things that could be done, low-hanging fruits and improvements. In particular, I've also put the elastic and lunr back into elasticlunr.

The branch I've been working on, which is much closer to being a PR candidate, now has the following changes made to it:

Indexing:

Search

Storage

Tests Basically everything is reworked. A ton of test cases were testing the internal state of objects more than whether a given feature worked. This is probably the biggest change.

A couple of tests were added to cover new components (the new index and DSL) or third-party library support (lunr-languages)

My next step is documentation and benchmarks.

smurrayatwork commented 4 years ago

@srenauld How's this effort going?

srenauld commented 4 years ago

@smurrayatwork on this, I've done very little due to a PR being frozen for months. On other fronts, I almost have a plug-in for this library for nextjs, which may or may not be useful to some.

raghur commented 4 years ago

@srenauld, @weixsong - given that this PR is merged, is there anyplace I can look up usage? current docs are out of date. With default options, an index is now more than 2x the size of the one produced by lunr.js ElasticLunr seemed very appealing due to ability to store fields & being able to do an 'AND' search on all terms - but without usage info, it's hard to use.

srenauld commented 4 years ago

@raghur Hello, and sorry about the delay!

I'm in the process of working on a side branch for this purpose specifically. The version currently available on npm is still 0.9.5 (i.e. before the changes). As such, all the old documentation is up to date, and so are all the specifics.

The branch I am working on (intermittently, admittedly. I've had a ton of things coming my way recently) is over here. I wired together a small example leveraging the gatsby plugin to both enforce backward-compatibility (wouldn't want to wreck a plugin by accident) and to showcase the use.

Regarding the index size, if you are using the new version, I'd be curious to see what you are indexing. This is the kind of feedback that is extremely valuable, as I can change the index format in the new version while still remaining backward-compatible (the old was a straight JSON.stringify() of the inner state - the new is a bit smarter). As nobody is currently using the new format (AFAIK), I'd be super interested to know what you are indexing and what your results are.

raghur commented 4 years ago

@srenauld - thanks for responding.. No problem at all.

TBH, I went through some of your changes and while I absolutely LIKE the features you list, I ran into a few issues (hence the reason I picked master instead of a released version at NPM). Having a working example is great for starters and exactly the thing I was missing picking up master here!

Content I'm indexing is from my blog - json is here - https://blog.rraghur.in/index.json. It isn't a large blog by any means and index was coming up to about 4.8MB (Lunr produced 2.3 - 3MB) I think. code to build index is here https://gitlab.com/rraghur/rraghur.gitlab.io/blob/elasticlunr/build-index.js

Also, options like not to storedocuments (I didn't want that on the full text blog content), etc weren't working and I was just running into too much. So finally wrote the comment here and switched back to lunr.

I'll try your branch and gatsby example early next week...

edsu commented 4 years ago

An option like storeDocuments to control whether fields are persisted to the index would be VERY useful for me. But it doesn't seem to work in v0.9.5. Does anything like it exist?