olivernn / lunr.js

A bit like Solr, but much smaller and not as bright
http://lunrjs.com
MIT License
8.89k stars 545 forks source link

indexing dates? #362

Closed fauno closed 6 years ago

fauno commented 6 years ago

Hi, I have a set of documents with a date field on them, with strings like "2014-01-01 00:00:00 -0300" (I never remember the names for the formats, sorry).

But when I try to do a search for year (date:2017* for instance), it doesn't return any result, though I did idx.field('date'). I'm inspecting the index and I don't see any mention to the date field being indexed, should I assume lunr doesn't index dates/numbers?

Just tried the search with date:2017-*, it takes a while longer than with other searches (2s!) and returns every document in the set, even though there's only two documents that should match that year and none of the others contain that string in other fields.

I'm using the spanish stemmer if it's any help :)

olivernn commented 6 years ago

Lunr doesn't really care what object the field contains, everything gets converted to a string via Object#toString. Since your date is being passed as a string Lunr doesn't treat it any differently from any other field.

A prefix search will be slower than a non prefix search, though 2s seems quite long. Is this any longer than any other search with a trailing wildcard?

What is your use case, what kind of date searches are you trying to do? There might be a way to index your dates in a way that works for what you want.

fauno commented 6 years ago

I was trying to do a year range search, for instance retrieving all documents since 2016, or between 2014 and 2017, etc.

Since #237 isn't implemented, I was going for a search per year and then merging the results, but I found the issue I described. Maybe if I index the year in its own field, so I don't have to do a wildcard search? I'll try later, but it didn't seem to index the full date anyway, I couldn't find any of the dates indexed by inspecting the index.

olivernn commented 6 years ago

Yeah I think you're best bet is to index just the year, then you can either use the wildcard search of just list every year you are interested in in the query string, e.g. year:2018 year:2017 year:2016 some other words.

fauno commented 6 years ago

I'm not getting any results either... I'm using a variant of the indexer on the index pre-building article to which I'm piping data.json (that later can't parse :/)

I'm attaching my test data in case I'm omitting something.

In test.js you can see a simple search by date, which should show 3 results for 2017 and 1 for 2018, but returns none. I'm inspecting idx.json with jq and I can't find anything about years.

index_by_year.zip

fauno commented 6 years ago

nevermind my comment on not parsing data.json, I found out the mistake :P

olivernn commented 6 years ago

Got it working?

If so I think lets close this issue, I think better indexing/searching of dates falls under #237. Let me know if you still have questions on this specific issue though.

fauno commented 6 years ago

no, i couldn't. the nevermind message was about an unrelated line of code. still getting no results for year:2017 nor 2017.

olivernn commented 6 years ago

Oops, sorry was to eager to close this!

Would you mind putting a simple reproduction of the bug on https://jsfiddle.net (or similar)

fauno commented 6 years ago

Thanks! I've uploaded the idx.json and test script in the previous zip, please tell me if you prefer jsfiddle :)

fauno commented 6 years ago

it works on jsfiddle but haven't figured out a way to load lunr-languages yet

https://jsfiddle.net/9vak3qcn/7/

fauno commented 6 years ago

Ok, confirmed locally the issue is with the Spanish stemmer! I thought I tried this before. Thanks for your help :)