Closed dkobak closed 4 years ago
The parallelism obtained through n_jobs
is very powerful, but also very memory intensive. The 0.4 release added an atlernative multi-threading approach using numba parallel
. If you set n_jobs=None
it should use this instead. This will be lower memory use (but still potentially quite high; I haven't really done much memory use optimization yet), and will have the threads bounded by the value of the environment variable NUMBA_NUM_THREADS
akin to, say OMP.
Let me see if I understand it correctly. I get the same performance and thread usage with n_jobs=None
as with n_jobs=1
(despite nb.config.NUMBA_NUM_THREADS
set to 8), and the docstring seems to confirm that None
defaults to 1
:
n_jobs: int or None, optional (default=None)
The number of parallel jobs to run for neighbors index construction.
``None`` means 1 unless in a :obj:`joblib.parallel_backend` context.
``-1`` means using all processors.
Is this correct?
Yes, the numba threading kicks in currently (in principle) if you have n_jobs=None or 1 as far as I know. This is, perhaps, less than ideal, but it is what works for now. The question is whether you actually see any multi-threading. Generally I see multiple core operating although no necessarily saturating for n_jobs=None, and lower memory usage than the equivalent n_jobs setting.
I guess this can be closed for now. Thanks for the explanations, Leland!
I am trying to understand how much RAM is used by NNDescent and what parameters affect the RAM usage.
I've been playing around with a 420,000x50 dataset, trying to construct kNN graph with k=100 on my laptop with 16Gb RAM. When I use Annoy (as part of FIt-SNE), it seems to use around ~1Gb of RAM for index construction. Pynndescent seems to use several times more than that and with some settings crashes with a memory error.
For example,
NNDescent(X, n_neighbors=100, n_jobs=-1)
crashes here:with a
I think 8 in this shape is the number of threads. With
n_jobs=1
it runs without an error but feels quite slow (8 minutes). The third dimensions appears to be fixed at 4. Is there any way to decreasemax_heap_update_count
without compromising the performance too much?Further,
NNDescent(X, n_neighbors=100, n_jobs=-1, low_memory=True)
crashes with the same error and the same array shape that it cannot allocate. How exactly doeslow_memory=True
affect the RAM usage?Funny enough, I did manage to run
NNDescent()
withn_neighbors=15
andn_jobs=-1
, followed byquery()
withk=100
. Together, this took ~2 minutes, so was 4x times faster thanNNDescent(X, n_neighbors=100, n_jobs=1)
. Does it make sense to use this trick to save RAM?