lucaong / minisearch

Tiny and powerful JavaScript full-text search engine for browser and Node
https://lucaong.github.io/minisearch/
MIT License
4.9k stars 137 forks source link

Suggestion: pass in document as 3rd param to tokenize to access nested objects #8

Closed peonmodel closed 5 years ago

peonmodel commented 5 years ago

I have documents with nested fields e.g.

[{ id: '', name: '', location: { postal: '11111', placename: '', unit: '07-03' } }]

i would like to access the nested field within e.g. 'location.postal'

currently i can use tokenize to return [postal,unit] e.g.

tokenize(value, key){
if (key === location) return [value.postal, value.unit]
}

but both results are under the same key "location"

minisearch.search('query', { fields: ['location'] });  // this will search both "postal" & "unit" even if i just need "postal" field

if tokenize simply take in a 3rd parameter, the entire document, i could parse the field key myself to access the nested value e.g.

tokenize(value, key, document){
if (key === 'location.postal') return document.location.postal;
if (key === 'location.unit') return document.location.unit;
}
lucaong commented 5 years ago

Hi @peonmodel , you are right, there should be an easy way to handle nested fields. Interestingly, I was already thinking about this issue, and I have an alternative idea I would like to know your opinion about.

I would like to have a simple way to handle any case where the field is not just a flat key in the document object. One case is nested keys, like yours. Another example is if the document object is an instance of some class, and a field is obtained by calling a method (e.g. user.getPostalCode()). One way in which both could be handled (and more) is if the fields array would optionally accept more than just strings:

let miniSearch = new MiniSearch({
  fields: [
    'firstName', // 'firstName' is just a key, all works in the standard way
    'lastName', // 'lastName' is also just a key
    // 'postal' is nested, so we specify how to extract it
    {
      name: 'postal',
      get: (document) => document.location.postal
    },
   // 'unit' is also nested
   {
      name: 'unit',
      get: (document) => document.location.unit
    }
  ]
})

What do you think about it? Would this cover your use case?

peonmodel commented 5 years ago

well, yes, that will cover my use case

i only suggested the tokenize(value, key, object) syntax as it involve minimal change to existing code, either will work for me

lucaong commented 5 years ago

I ended up implementing this in yet another way, more consistent with the existing tokenize and processTerm options. Basically, you can now pass a extractField option, to customize the field extraction and processing logic:

// Assuming that our documents look like:
const documents = [{
  id: 1,
  name: 'Some name',
  location: {
    postal: '11111',
    placename: 'Some city',
    unit: '07-03'
  }
}]

// You can support nested fields with a custom `extractField` function like:
let miniSearch = new MiniSearch({
  fields: ['name', 'postal', 'placename', 'unit'],
  extractField: (document, fieldName) => {
    if (fieldName === 'postal' || fieldName === 'placename' || fieldName === 'unit') {
      return document.location[fieldName]
    } else {
      return document[fieldName]
    }
  }
})

miniSearch.addAll(documents)

Also check out the example in the README for a more generic approach: https://github.com/lucaong/minisearch#field-extraction

This new feature is already available in version v1.2.0, I hope this fulfills you use case. Thanks a lot for reporting this!