Open TenkenNoSoujiro opened 5 years ago
I’m a fan of TypeScript, and if I were starting lunr now I would definitely write it in TypeScript, that said I’m wary of converting the current code base over to TypeScript as is, I’d rather do it in conjunction with some larger re-write or similar.
I will have a look though at how you implemented the different field types, that is an often requested feature.
Modifies lunr.tokenizer to "pass-through" lunr.Token instances (with any additional metadata), rather than calling .toString() on them. This allows custom extractors to provide their own position metadata for hit-highlighting.
This is an interesting approach, explain to me what an “extractor” is in this case? What can they do that wouldn’t be possible in an existing pipeline function?
Searching by phrase, e.g. “Scarlet helps Professor” is something that I think would be really cool, you could additionally be ‘fuzzy’ on how exact the phrase is, e.g. “Scarlet helps Professor”~2 could allow some additional words between those in the phrase. I can’t think of how to do this kind of searching without substantially increasing the size of the index though, either by storing n-grams or storing the term index. How were you thinking about implementing this?
Just something to consider: remaining a superset of JS forces TS to adopt breaking changes whenever the underlying standard changes. This happened with ES6 modules and is happening again with private members. Much more complex rewrites will be required if JS implements a native type system.
There are types for Lunr already: https://www.npmjs.com/package/@types/lunr, not sure if this is up to date though
I recently forked lunr.js to add a feature and decided to investigate migrating the sources to TypeScript. If you are interested, I can create a pull request for my changes. My fork contains the following changes:
lib/
to TypeScriptlunr.d.ts
TypeScript Declaration file that can supplant the definitions on DefinitelyTyped (and would stay up-to-date with future releases as it is built from source)lunr.js.map
source map for debugging when running tests.eslint-plugin-typescript
andtypescript-eslint-parser
to continue using ESLint against the TypeScript sourcesdocs/
from TypeScript build outputs (though it might be worth the time to migrate from JSDoc to TypeDoc for better documentation against TypeScript sources)."number"
type fields (viabuilder.field("wordCount", { type: "number" })
)"number"
type fields: (i.e.wordCount:>=10
), similar to GitHub Advanced Search.wordCount:10..20
,wordCount:*..50
, etc.), similar to GitHub Advanced Search.lunr.tokenizer
to "pass-through"lunr.Token
instances (with any additional metadata), rather than calling.toString()
on them. This allows custom extractors to provide their ownposition
metadata for hit-highlighting.I'm also investigating adding support for
"date"
type fields similar to the"number"
support above, as well as quoted terms. For quoted terms, I am thinking that"Scarlett helps Professor"
should only match documents with all three terms (possibly with a higher rank the closer they appear together in the correct order), while-"Scarlett helps Professor"
should only reject documents with all three terms.