Open d4l3k opened 6 years ago
This is on v2.1.4 btw.
I'm guessing this is because an object is created for metadata, but that's not needed if builder.metadataWhitelist
is empty: https://github.com/olivernn/lunr.js/blob/2dc9c6c6c41b1f5850f2bed0a82d8cd45835d166/lib/builder.js#L161-L176
Yep, those objects are where metadata is stored for each term.
There is certainly scope for some substantial memory savings for in memory indexes here. The Proxy object solution is cool, but I'd like to keep ES5 compatibility for the time being.
There are a couple of approaches that might work here. The simplest to implement (I think) would be to re-use the same empty object if there is no metadata for an index. This would only require changes to the builder. Alternatively, instead of keeping an empty object, it could be replaced with null
or similar, this would also require changes to the index.
I'm not sure if the memory usage is caused by the empty object, but more the number of duplicate key strings. I also experimented with just removing the empty keys and in the lunr.js code adding || {}
anywhere it was used. Seemed to work fine, but I don't know the code base well enough to avoid any edge cases
@d4l3k Would you mind posting the script and data you're using to get these numbers? I'm curious to check it out!
I linked the script above https://github.com/nwhacks/nwhacks2018_static/blob/master/components/select-hackers/select-hackers.js
And I'm afraid I can't provide the data. It's the registration data for https://www.nwhacks.io/
Same issue here. It would be great to fix this. I decreased my container memory with pruneIndex function a lot.
I was trying to debug some large memory usage issues with lunr.js. I found that the lunr invertedIndex was roughly 138MB. By removing keys with empty objects I was able to cut 138MB down to 82MB which is a 40% savings!
Code to prune the index without breaking it using Proxy objects:
https://github.com/nwhacks/nwhacks2018_static/blob/master/components/select-hackers/select-hackers.js#L272
Presumably this would be a good idea to add into lunr.js since there's significant savings.