olivernn / lunr.js

A bit like Solr, but much smaller and not as bright
http://lunrjs.com
MIT License
8.87k stars 547 forks source link

Feature Request: object pipelines #319

Open hoelzro opened 6 years ago

hoelzro commented 6 years ago

This isn't so much a feature request as it is a question about whether this is a good idea, and what the design should look like to fit into lunr.js in general. If it doesn't belong in lunr.js itself, I think I could make a separate plugin like I did for mutable indexes.

Basically, I was wondering if it would be possible to have a pipeline function be an object instead of just a function. My use case is that I want to perform query expansion (so that searching for "FTS" matches "full text search" and vice versa) - my current implementation relies on some state outside of the function, which makes persisting the pipeline tricky. It would be great if I could have an object that holds on to the needed state with a run method or something to perform the expansion, and then the state could be serialized along with other parts of the index. I'm not sure how reloading that object would work on the other end, but I think we could figure that out.

This would also solve another thing that's been bugging me - lunr.tokenizer has a separator field, but it's not persisted as part of the index, AFAICT. This feature would also the tokenizer to persist its state without needing to synchronize at either end of serialization.

Either way, let me know what you think!

olivernn commented 6 years ago

So, currently lunr.Pipeline#run calls pipeline functions with parens. As far as I'm aware this is difficult (impossible?) to handle with an object other than a function. However, we could change to using Function#call which any other object can also implement. There would be the question of what to bind this to and how that would impact any scope inside that function.

Alternatively, couldn't an objects method be used as a pipeline function, especially if the context was fixed by calling Function#bind? Another option would be to capture the required state within a closure, this is the approach taken by lunr.stopWordFilter.

This would also solve another thing that's been bugging me - lunr.tokenizer has a separator field, but it's not persisted as part of the index, AFAICT.

That is an interesting observation that I hadn't noticed or thought about before and I think it raises an important question about what should and shouldn't be in a serialised index. Ideally, the entire state of an index would be serialised, however, that isn't really possible with pipeline functions, hence the current approach of 'registering' them etc. Configuration for those pipeline functions is sort of a grey area though, it is currently assumed that any configuration such as lunr.tokenizer.separator is made before loading/using a serialised index. I guess it depends on whether the state/configuration is specific to this instance of the index or not...

hoelzro commented 6 years ago

So, currently lunr.Pipeline#run calls pipeline functions with parens. As far as I'm aware this is difficult (impossible?) to handle with an object other than a function. However, we could change to using Function#call which any other object can also implement. There would be the question of what to bind this to and how that would impact any scope inside that function.

I was thinking something along the lines of if(isFunction(fn)) { return fn(token) } else { return fn.run(token) }, but that's not very elegant!

Alternatively, couldn't an objects method be used as a pipeline function, especially if the context was fixed by calling Function#bind? Another option would be to capture the required state within a closure, this is the approach taken by lunr.stopWordFilter.

Using a closure is what I currently do, but then if the closed-over data were to change (as can be the case with my query expansion), I would need to re-register the function, and that state is independent of the index.

If you'd like, I could write a lunr.ObjectPipeline class to prototype this idea!