olivernn / lunr.js

A bit like Solr, but much smaller and not as bright
http://lunrjs.com
MIT License
8.96k stars 548 forks source link

Fixed a problem where a document with lots of tokens causes a RangeError #202

Closed dangrie158 closed 8 years ago

dangrie158 commented 8 years ago

Before this fix, the index passed all tokens via an apply call, pushing all parameters on the stack. This behaviour causes a RangeError when more tokens are added than the JS implementation can handle.

This fix changes the code so that now an explicit Array is passed instead of pushing the values of the array on the stack.

Some Tests on the maximum Stack size using the suggested method from http://stackoverflow.com/questions/22747068/is-there-a-max-number-of-arguments-javascript-functions-can-accept:

olivernn commented 8 years ago

Well I learnt something today! I guess it makes sense that there is an upper limit on the number of parameters, I doubt many people ever run into it though.

It seems strange to me that lunr.SortedSet.prototype.add iterates through a list of input elements at all. I'm trying to remember why it was originally like this but can't think of anything.

Ideally I'd like to change the signature of lunr.SortedSet.prototype.add to only accept one item at a time. This would also remove the use of arguments in the implementation which is likely going to give us a slight performance boost also. It would be backwards incompatible though and so we should probably hold off on that change for now.

Instead when we call it internally we can treat it like it only accepts one element and do all the iteration in the calling scope. Does that make sense? For example, instead of:

lunr.SortedSet.prototype.add.apply(this.corpusTokens, allDocumentTokens.toArray())

we would have something like this:

for (var i = 0, tokens = allDocumentTokens.toArray(), len = tokens.length; i < len; i++) {
  this.corpusTokens.add(tokens[i])
}

In fact, it looks like we are already iterating over the allDocumentTokens in the very next line.

Perhaps also fix this case of using apply also.

Also, please take a look at the contributing guide. Specifically changes should only be made in the files in lib, in this case it looks like the index file is what you want.

dangrie158 commented 8 years ago

Thanks for your feedback. What you said makes sense of course, changing the API should not be neccecary. I will make another commit on thursday that will respect all of your suggestions.

I'm sorry I was in a hurry and completly missed the contributing guide.

dangrie158 commented 8 years ago

I just corrected the code fixes as suggested. I also searched for all usages of 'apply' in the code. The only ones left are in the EventEmitter and the Plugin Subsystems. I don't think this should be problematic.

I'm not too familiar with GitHub. Is it possible for you to merge only one commit or can you only merge the whole branch? In that case I would open another pull request without all my screw ups :)

olivernn commented 8 years ago

Change looks good to me, I think you're right that the other instances of apply are as they should be.

As far as I'm aware I can only merge the whole branch, this leaves you two options:

  1. Sqaush all these commits into one commit
  2. Create a new branch/pr and cherry pick your last commit

I think this change should all be in just one commit, whatever you do, try and preserve the detail you currently have in the commit message, I'm a big fan of detailed commit messages!

d0ugal commented 8 years ago

I think this can be closed in favour of #203

dangrie158 commented 8 years ago

Yes,you're absolutely right.