olivernn / lunr.js

A bit like Solr, but much smaller and not as bright
http://lunrjs.com
MIT License
8.89k stars 548 forks source link

Suggestion: Migrate to TypeScript #388

Open TenkenNoSoujiro opened 5 years ago

TenkenNoSoujiro commented 5 years ago

I recently forked lunr.js to add a feature and decided to investigate migrating the sources to TypeScript. If you are interested, I can create a pull request for my changes. My fork contains the following changes:

I'm also investigating adding support for "date" type fields similar to the "number" support above, as well as quoted terms. For quoted terms, I am thinking that "Scarlett helps Professor" should only match documents with all three terms (possibly with a higher rank the closer they appear together in the correct order), while -"Scarlett helps Professor" should only reject documents with all three terms.

olivernn commented 5 years ago

I’m a fan of TypeScript, and if I were starting lunr now I would definitely write it in TypeScript, that said I’m wary of converting the current code base over to TypeScript as is, I’d rather do it in conjunction with some larger re-write or similar.

I will have a look though at how you implemented the different field types, that is an often requested feature.

Modifies lunr.tokenizer to "pass-through" lunr.Token instances (with any additional metadata), rather than calling .toString() on them. This allows custom extractors to provide their own position metadata for hit-highlighting.

This is an interesting approach, explain to me what an “extractor” is in this case? What can they do that wouldn’t be possible in an existing pipeline function?

Searching by phrase, e.g. “Scarlet helps Professor” is something that I think would be really cool, you could additionally be ‘fuzzy’ on how exact the phrase is, e.g. “Scarlet helps Professor”~2 could allow some additional words between those in the phrase. I can’t think of how to do this kind of searching without substantially increasing the size of the index though, either by storing n-grams or storing the term index. How were you thinking about implementing this?

indolering commented 5 years ago

Just something to consider: remaining a superset of JS forces TS to adopt breaking changes whenever the underlying standard changes. This happened with ES6 modules and is happening again with private members. Much more complex rewrites will be required if JS implements a native type system.

endymion1818 commented 5 years ago

There are types for Lunr already: https://www.npmjs.com/package/@types/lunr, not sure if this is up to date though