Add support for stored fields and non-indexed fields

kevinastone commented 9 years ago

This allows storing fields in the index with our without indexing them in the token store.

Allows you to add attributes that you want returned in the search results.

Example:

var index = new lunr.Index()
index.field('url', {store: true, index: false})
index.field('title', {store: true, index: true})
index.field('body')

index.add({
    'id': 1,
    'url': 'http://example.com/getting-started.html',
    'title': 'Getting Started with Lunr.js'
    'body': '...'
});

var results = index.search('lunr');
results.forEach(function(result) {
    console.log('<a href="' + result.fields.url + '">' + results.fields.title + '</a>');
});

olivernn commented 9 years ago

Thanks for taking a look at this, but I don't think it is something that should be in lunr.

This request has come up a few times in the past, and my stance is that storing the documents like this in lunr doesn't provide much benefit and will bloat the index and add extra complexity to lunr. The results returned by a search always include a ref for the document, so it is easy for the application using lunr to retrieve the full document from wherever that document is being stored.

This way is much more flexible, since there are a multitude of different ways an application can store its data, whether that be locally in memory, in some local database, or accessing some remote API.

I think keeping lunr more focused on being a datastore that can service search requests, rather than one for storing arbitrary JSON documents is the right choice, though if you have a use case that makes this choice difficult do let me know, perhaps there is something I can suggest.

kevinastone commented 9 years ago

That's understandable, but I wrote this because this was the second time using lunr.js and for the second time I had to write some shim to store the fields on the index so I could retrieve values like Title that I want to display with the search results.

I think it's a common pattern to store some summary fields that can be returned with the results is a common enough pattern. There might be ways to abstract it to support more diverse cases, but this seemed to be an enhancement.

olivernn commented 9 years ago

There may be a way to provide this as a plugin for lunr...

lunr exposes some hooks on documents being added, updated and removed from the index, you could make use of these to keep track of your documents in memory, then also wrap the current search function to return the full documents when a search is performed, off the top of my head that might look like this:

var storedFields = function (idx) {
  var store = {},
      refName = idx._ref

  idx.on('add', 'update', function (document) {
    store[document[refName]] = document
  })

  idx.on('remove', function (document) {
    store[document[refName]] = null
  })

  var originalSearch = idx.search.bind(idx)

  idx.search = function (query) {
    return originalSearch(query).map(function (result) {
      return store[result.ref]
    })
  }
}

idx.use(storedFields)

Note, I haven't tested this at all! The idea is that, whenever a document is added, updated, or removed in the index you store that document somewhere (in this example just a local object), then the wrapped search function uses the ref property of results to lookup the full document in this object before returning results.

I'm fairly sure that the document as a whole will be passed to the callbacks, so you don't have to worry about whether a field is indexed by lunr or not. Is this something that you can adapt to your use case at all?

kevinastone commented 9 years ago

Let me investigate further. Couple of points:

fields are stored (or not) individually, so it's not storing the entire document, just the individual fields (this matches solr which allows indexed and stored fields)
these stored fields/documents need to be JSON-ified with the rest of the index since I'm loading that JSON on the client.

Here's how I did it previously: https://github.com/eventbrite/kss-search/blob/master/src/index.coffee#L27-L28 https://github.com/eventbrite/kss-search/blob/master/src/index.coffee#L52-L53

I find it considerably cleaner to define this store and index properties on the field definitions themselves that get persisted with the index.

By comparison, here's using this patch:

class Indexer
    constructor: ->
        @index = lunr ->
            @ref 'id'
            @field 'url', store: true, index: false
            @field 'title', boost: 10, store: true
            @field 'content'

    indexItem: (item) ->
        document =
            id: item.link
            url: item.link
            title: item.title
            content: striptags item.description

        @index.add document

    save: ->
        output = @index.toJSON()
        return output

olivernn / lunr.js

Add support for stored fields and non-indexed fields #188