Suggestion: Migrate to TypeScript

TenkenNoSoujiro commented 5 years ago

I recently forked lunr.js to add a feature and decided to investigate migrating the sources to TypeScript. If you are interested, I can create a pull request for my changes. My fork contains the following changes:

Migration-related changes:
- Migrate lib/ to TypeScript
- Generates a lunr.d.ts TypeScript Declaration file that can supplant the definitions on DefinitelyTyped (and would stay up-to-date with future releases as it is built from source)
- Generates a lunr.js.map source map for debugging when running tests.
- Uses eslint-plugin-typescript and typescript-eslint-parser to continue using ESLint against the TypeScript sources
- Generates docs/ from TypeScript build outputs (though it might be worth the time to migrate from JSDoc to TypeDoc for better documentation against TypeScript sources).
Other changes:
- Adds support for defining "number" type fields (via builder.field("wordCount", { type: "number" }))
- Adds support for searching using relational comparisons against "number" type fields: (i.e. wordCount:>=10), similar to GitHub Advanced Search.
- Adds support for searching using numeric ranges (i.e. wordCount:10..20, wordCount:*..50, etc.), similar to GitHub Advanced Search.
- Modifies lunr.tokenizer to "pass-through" lunr.Token instances (with any additional metadata), rather than calling .toString() on them. This allows custom extractors to provide their own position metadata for hit-highlighting.

I'm also investigating adding support for "date" type fields similar to the "number" support above, as well as quoted terms. For quoted terms, I am thinking that "Scarlett helps Professor" should only match documents with all three terms (possibly with a higher rank the closer they appear together in the correct order), while -"Scarlett helps Professor" should only reject documents with all three terms.

olivernn commented 5 years ago

I’m a fan of TypeScript, and if I were starting lunr now I would definitely write it in TypeScript, that said I’m wary of converting the current code base over to TypeScript as is, I’d rather do it in conjunction with some larger re-write or similar.

I will have a look though at how you implemented the different field types, that is an often requested feature.

Modifies lunr.tokenizer to "pass-through" lunr.Token instances (with any additional metadata), rather than calling .toString() on them. This allows custom extractors to provide their own position metadata for hit-highlighting.

This is an interesting approach, explain to me what an “extractor” is in this case? What can they do that wouldn’t be possible in an existing pipeline function?

Searching by phrase, e.g. “Scarlet helps Professor” is something that I think would be really cool, you could additionally be ‘fuzzy’ on how exact the phrase is, e.g. “Scarlet helps Professor”~2 could allow some additional words between those in the phrase. I can’t think of how to do this kind of searching without substantially increasing the size of the index though, either by storing n-grams or storing the term index. How were you thinking about implementing this?

indolering commented 5 years ago

Just something to consider: remaining a superset of JS forces TS to adopt breaking changes whenever the underlying standard changes. This happened with ES6 modules and is happening again with private members. Much more complex rewrites will be required if JS implements a native type system.

endymion1818 commented 5 years ago

There are types for Lunr already: https://www.npmjs.com/package/@types/lunr, not sure if this is up to date though

olivernn / lunr.js

Suggestion: Migrate to TypeScript #388