morungos / wordnet

A Wordnet API in pure JavaScript
MIT License
107 stars 23 forks source link

Slow queries #29

Open sr258 opened 4 years ago

sr258 commented 4 years ago

I want to use the library to get base forms of tokens in long texts. I call validFormsAsync and the results seem ok. While the library works in general, access is relatively slow. I seem to get a throughput of about 40 lookups per second. Is this normal or am I doing something wrong?

morungos commented 3 years ago

Now that we are close to the forthcoming 1.0 release, I've done some work on this. validForms is very likely the worst case of what we are going to handle, since it involves a lot of database I/O, and this is not a database system. It's probably a good idea to add an additional cache to the index system at the very least -- this could make I/O very significantly faster. I mean, on a modern system we could virtually put the whole of WordNet in RAM anyway. But it'd be nice if we can get a worst case of each word lookup being 2 or so disk requests, rather than the probably 10-15 we have right now.

A simple lookup on each word isn't too bad, especially with the query cache system in place. We're under a millisecond per word when reading a fairly large block of text with our naive current cache. But I accept that we can do much better with some modest improvements.

So I'm going to break this into a few separate issues for future tracking.