Closed DomHudson closed 6 years ago
Hey Dom,
The answer is complicated, but I can give you a little bit more information on how Magnitude works and the settings and options available for caching and loading and that may help inform how to get the speed to best match your use case. Magnitude's settings are out-of-the-box configured to make local developing and iterating on models using word vectors reasonably fast as well as approach in-memory speed for production server deployment. The time trade-off made is we get rid of initial load times to make iterating much faster by making initial queries a little slower, but for many repeated queries we want Magnitude to approach Gensim so it can still be used in production.
This means on benchmarks that don't simulate repeated lookups it may appear slow because those benchmarks don't fully utilize the cache that would be available an average production scenario. Moreover, there are background threads run when Magnitude boots up that begin eagerly pre-fetching data into caches. This is useful for production environments to pre-fetch data into the cache in the background, but will also negatively impact a benchmark score.
Magnitude ultimately works by using a SQLite index for looking up a token and getting the vector. It lets SQLite and the OS manage caching data into memory which should be quite optimized.
However, there is one additional layer of caching introduced by Magnitude where query
calls are LRU-cached (the size of the LRU can be configured with the lazy_loading
constructor argument and is by default unbounded). What this means is that, over time, Magnitude will get faster as the same words are looked up over and over again. We found this to work really well practically due to Zipf's Law. Even though the first look up of the word "the" might be a little slow compared to Gensim's mmap method, it is negligible, since over time the look up for the word "the" will be fast since it will hit the in-memory LRU cache.
Here's a few things you can try:
Turn off background threads eagerly loading the LRU caches:
vectors = Magnitude('/path/to/w2v.magnitude', eager=False)
Turn on blocking (this will turn off lazy-loading and require you to wait a little bit before you can perform queries, but make the queries faster):
vectors = Magnitude('/path/to/w2v.magnitude', blocking=True)
Use the raw vectors NumPy mmap:
vectors.get_vectors_mmap()
This requires knowing the index of the word you want to lookup (you may need a separate data structure to do this). It is also takes sometime to build this mmap, so you will have to wait. However, it is cached even between different runs of Python on your computer/server's /tmp/
directory. So you only have to wait once.
Overall, I suspect if you want to / need to eke out every bit of performance in your application, there possibly are faster ways to do so than what we use in Magnitude, but we try to make Magnitude the simplest way to make development and production use of word vectors reasonably fast without having to one configuration/library for development and another configuration/library for production. We also add a ton of the nice to have features like misspelling lookups, OOV lookups, approximate lookups, multi-threading support etc. Would really depend on your application whether it makes sense to trade these off or not.
If you have any ways of making Magnitude faster that you find from your usage, that doesn't affect its current performance, feel free to submit a PR!
Thank you very much for the highly informative response - it is massively appreciated! I'm including the profile data at the bottom of this comment for posterity - but it absolutely agrees with your points.
One thing I did spot is the use of fetchall
in _vector_for_key
.
Altering this method to following (using fetchone
) did result in a minor performance boost despite the LIMIT statement.
def _vector_for_key(self, key):
"""Queries the database for a single key."""
result = self._db().execute(
"""
SELECT *
FROM `magnitude`
WHERE key = ?
ORDER BY key = ? COLLATE BINARY DESC
LIMIT 1;""",
(key, key)).fetchone()
if result is None or self._key_t(result[0]) != self._key_t(key):
return None
else:
return self._db_result_to_vec(result[1:])
I will also do some further research, and see if I can spot anything else - many thanks.
8947786 function calls (8945672 primitive calls) in 94.256 seconds
Ordered by: internal time
ncalls tottime percall cumtime percall filename:lineno(function)
793 88.668 0.112 88.668 0.112 {method 'execute' of 'sqlite3.Cursor' objects}
980896/980000 2.783 0.000 94.256 0.000 /home/dom/Code/magnitude/pymagnitude/third_party/repoze/lru/__init__.py:337(cached_wrapper)
980137 0.855 0.000 0.855 0.000 /home/dom/Code/magnitude/pymagnitude/third_party/repoze/lru/__init__.py:102(get)
2941929 0.724 0.000 1.062 0.000 /home/dom/Code/magnitude/pymagnitude/third_party/repoze/lru/__init__.py:343(<genexpr>)
1962414 0.339 0.000 0.339 0.000 {built-in method builtins.isinstance}
793 0.229 0.000 0.229 0.000 {method 'fetchall' of 'sqlite3.Cursor' objects}
980896 0.149 0.000 0.149 0.000 /home/dom/Code/magnitude/pymagnitude/third_party/repoze/lru/__init__.py:344(<genexpr>)
7327 0.122 0.000 0.122 0.000 {method 'uniform' of 'mtrand.RandomState' objects}
980898 0.102 0.000 0.102 0.000 {method 'items' of 'dict' objects}
154 0.038 0.000 88.751 0.576 /home/dom/Code/magnitude/pymagnitude/__init__.py:515(_out_of_vocab_vector)
@DomHudson The CI had some trouble deploying to PyPI since a dependency broke, but it's all fixed now. The SQLite query optimization is now on v.0.1.18. Run pip install pymagnitude -U
or pip3 install pymagnitude -U
to update.
Thanks for the tip! Feel free to open another issue or PR, if there's any other changes you see fit.
Great thanks!
Hi!
I really hope this question doesn't come across as critical - I think this project is a great idea and really loving the speed at which it can lazy-load models.
I had one question - loading the Google news vectors is massively quicker in magnitude than gensim, however I'm finding that querying is significantly slower. Is this to be expected? It's is quite possible that this is a trade-off against loading time but want to confirm that there's nothing weird going on in my environment.
Code i'm using for testing:
For the code in the above; I get gensim being approximately 5x faster if memory-mapped and if not over 13x faster.