yeraydiazdiaz / lunr.py

A Python implementation of Lunr.js 🌖
http://lunr.readthedocs.io
MIT License
188 stars 16 forks source link

How to disable stop_word_filter trimmer for specific fields #108

Closed tristanlatr closed 2 years ago

tristanlatr commented 2 years ago

Hello,

I'm building search system and generate a index with lunr.py (actually searching with lunr.js) and I would like to disable the stop_word_filter for fields that are not actually text, like object names. Because objects can be named If or For (in the AST library for instance).

Do you know if I can reach such behaviour ?

Thanks

tristanlatr commented 2 years ago

Looks like currently, all pipeline functions are applied to all fields no matter what: https://github.com/yeraydiazdiaz/lunr.py/blob/b9533caab2fd68513098cd7eaa9e011d7b15fedc/lunr/pipeline.py#L107

We could introduce a new Pipeline method that would allow such configuration.

For instance Pipeline.skip(fn:Callable, fields:List[str]) that would make the pipeline function fn not be executed on specific fields.

Tell me what you think, I can send a PR.

Edit: We'll probably need to change Pipeline.run such that it accepts the field name and change the call here https://github.com/yeraydiazdiaz/lunr.py/blob/b9533caab2fd68513098cd7eaa9e011d7b15fedc/lunr/builder.py#L147

yeraydiazdiaz commented 2 years ago

Hi @tristanlatr, if I undertood correctly Pipeline.skip would not run any of the functions in the pipeline on specific fields, which might be too coarse of a tool.

I wonder if it might be more flexible to extend the pipeline function signature to include the field being processed so you can have full control over what happens with each field? That does mean more work for you as the caller since you'd have to instantiate the Builder but I think it might be worth it to have full control.

tristanlatr commented 2 years ago

Hi @yeraydiazdiaz,

if I undertood correctly Pipeline.skip would not run any of the functions in the pipeline on specific fields

It would skip only the specified functions, not all of them.

I wonder if it might be more flexible to extend the pipeline function signature to include the field being processed so you can have full control over what happens with each field?

can you elaborate ? I did not get how we can achieve such behavior by other means.

tristanlatr commented 2 years ago

I wonder if it might be more flexible to extend the pipeline function signature to include the field being processed so you can have full control over what happens with each field?

Do you mean changing the signature fromfn(token, i, tokens) to def fn(token, i, tokens, field_name) ? To do it in a retro-compatible way, we would need to call inspect.signature to check if the pipeline function has the right argument... Yes this could allow more control, but the feature I'm looking for is realy precise and is just about skipping a specific function for a specific field. I could look into implementing what you'r asking for, but I must say that it would make my life easier to just accept #109 ^^'

Talk to you later.

yeraydiazdiaz commented 2 years ago

Hey @tristanlatr, yep, you're right. I'm happy to accept #109, it's just failing linting checks atm, if you'd like to take care of those I'll merge it.