lucaong / minisearch

Tiny and powerful JavaScript full-text search engine for browser and Node
https://lucaong.github.io/minisearch/
MIT License
4.67k stars 133 forks source link

Stricter typing needed #208

Closed juiceo closed 1 year ago

juiceo commented 1 year ago

Hi! I like the library, but would need it to have some stricter typing to be usable in production applications.

Would it be possible to modify the SearchOptions and Options types so that we pass in a generic type argument for the shape of the rows, and restrict the fields and other similar properties to only keys of the row type?

A simple implementation would look like this:

// Before: 

export type Options<T = any> = {
   /**
    * Names of the document fields to be indexed.
    */
  fields: string[],
  ...
}

// After

export type Options<T = any> = {
   /**
    * Names of the document fields to be indexed.
    */
  fields: (keyof T)[],
  ...
}
lucaong commented 1 year ago

Hi @juiceo , this would be nice, to enforce that the values passed to the fields option are indeed valid document fields. One problem with that, though, is that while most commonly documents are plain key/value objects, MiniSearch does not restrict the document type, so in principle they could be anything.

One common case is to have nested fields, and use a custom extractField to handle them, like in this example. In that example, the fields "author.name" and "pubYear" are not keys of the document type. This is why MiniSearch chooses to be a bit more loose with typing, to allow more flexibility.

juiceo commented 1 year ago

Thanks for the context @lucaong! I can totally understand the need for this sort of flexibility ๐Ÿ‘ I think there is a way to solve this without needing to compromise on that, so we could get the best of both worlds ๐Ÿค”

I have a suggestion that should work and would support both of these cases - but it would require a breaking change to how the fields are defined. Essentially instead of separate fields, storeFields, idField, extractFields we could rather define the fields array as an array of objects which would define all of the above.

Consider this an RFC:

export type BaseFieldDef<T> = {
    store?: boolean; // Replaces `storeFields`
    id?: boolean; // Replaces `idField`
};

export type PathFieldDef<T> = {
    path: Path<T>;
};

export type ExtractFieldDef<T> = {
    name: string;
    extract: (document: T) => string;
};

export type FieldDef<T> = (PathFieldDef<T> | ExtractFieldDef<T>) & BaseFieldDef<T>;

// Examples:

type Car = {
    id: string;
    make: string;
    model: string;
    specifications: {
        engine: string;
        transmission: string;
    };
};

const fields: FieldDef<Car>[] = [
  {
    path: 'id',
    id: true,
  },
  {
    path: "specifications.engine",
    store: true,
  },
  {
    name: 'make_and_model',
    extract: (car) => `${car.make} ${car.model}`,
    store: false,
  }
];

const miniSearch = new MiniSearch<Car>({
    fields,
})

The Path<T> type allows any string that is a valid dot-notation object access to T, so we could easily support nested properties without the need for a custom extractor function.

Thoughts on the approach? Happy to refine it further if you think it could be worthwhile to explore. I think given a bit more thought, there might also be a way to support this sort of thing without making any breaking changes to the API.

lucaong commented 1 year ago

Hi @juiceo , this is a very ingenious solution, impressive ๐Ÿ‘๐Ÿผ๐Ÿ™‚

That said, I think the benefits do not outweigh the cost of introducing such a breaking change. MiniSearch primary goal is to support JavaScript and TypeScript users with a simple and flexible API, that scales with user needs. Strict type safety in TS is a desirable feature, but subordinate to the goal above.

A large percentage of users come from plain JS, and only need the basic fields option: for those, the path of least friction is to simply require an array of strings. The current API is โ€œadditiveโ€ for more complex needs: more advanced options are layered on top, without making the basic API more complex.

I understand the appeal of a strict type design, but that tends to impose a level of complexity (even in the type definitions) that most users are not comfortable with.

Again, I do see the appeal of a solid type definition that works well with generics. But also, other users that are into functional programming might wish to have an API made with composable pure functions. Or OOP-inclined developers might want to specify the options as a dedicated class with a fluent API. While I do like some of these ideas myself, I think that a foundational building block library like MiniSearch benefits more from being as simple as possible, and rather unopinionated about the programming style (beyond being idiomatic JavaScript).

That said, Iโ€™d be super happy to see a stricter adapter designed on the basis of your idea, wrapping the more โ€œtype lenientโ€ MiniSearch core ๐Ÿ™‚

juiceo commented 1 year ago

Makes perfect sense ๐Ÿ‘Œ As it happens I already implemented this sort of adapter for minisearch for our use case, so I could definitely take a look at putting out a PR for it.

What do you think would be the best way to introduce the possibility to opt in to stricter typing? Is there a way it could be included in the core package or do you see something like this as belonging more to a 3rd party package?

(TBH not really excited about the prospect of becoming an npm package maintainer ๐Ÿ˜„ ...but if you feel there's a way we can make it part of the core package would be happy to take a stab!)

lucaong commented 1 year ago

In the short term I feel more like it should belong to a 3rd party package. One alternative could be to introduce a (much needed) "how to" section in the docs, explaining how to do various things, including "how to make the options more type safe with TypeScript". In this case there would be no additional package, but rather some instructions. If you want, you can provide some example code in this thread, and I will copy it to the "How To" section when I create one.

I will evaluate what would it mean to support the type safe but more verbose way of specifying options on top of the current way. If it does not involve too much additional code to maintain and test, nor too much complexity, it might be worth adding this to the core.

juiceo commented 1 year ago

Alright, I'll see if I can make something happen with that ๐Ÿ‘