Closed mierzejk closed 1 year ago
The pull request may, at least partially resolve the following issues: #6 estimate density ratio of large training set and test set and #8 density ratio estimation of high dimension data. According to my tests, both numpy
and numba
targets can deal with x_list
and y_list
matrices that consume over 20GB+ altogether, if enough virtual memory is available.
The pull request offers prospect of even greater performance improvement for large sets of data by taking advantage of numba
cuda
target. Yet that would require some extra work, not fully aligned with currently implemented numba.guvectorize
approach.
A side-note in respect of the performance results: just recently I ran the benchmark with the same densratio_py
codebase I have submitted in the following two environments:
And to my surprise, despite the fact all 32 cores were being utilized in Windows environment, the process executed a few times faster on my reportedly less powerful laptop. I am not really sure what the real cause of that is. It might be the operating system itself. But perhaps it is due to the fact I have my laptop setup with regard to PyTorch
performance, namely I have built numpy
, numba
, Cython
and mkl
from sources by myself. On Windows all packages have been delivered pre-built either by Anacoda
or pip
.
The original benchmark results I attached to the first pull request post were measured in the first environment, i.e. my Dell Precision M4800 running Ubuntu 18.04.4 LTS.
It is the greatest contribution!
densratio.RuLSIF.compute_kernel_Gaussian
has been updated with a performance-improved implementation. The sheet comparing the baseline (original) and performance-improved implementations is also available at https://bit.ly/3X7asIm; I hope it is pretty self-explanatory.The
densratio.RuLSIF.set_compute_kernel_target
(also available to be imported directly fromdensratio
) accepts one of the following string arguments, and sets the underlying engine to carry out calculations:numpy
- numpy broadcasting optimized. It must be noted the underlying BLAS library (e.g. Intel's MKL) can take advantage of multi threading model.cpu
- numba generalized universal function single thread optimized.parallel
- numba generalized universal function multi thread optimized. Please be advised all threading layer specifics apply.Because of aforementioned multi threading technicalities, the
engine
defaults tocpu
whennumba
is available, ornumpy
otherwise. I do not think adding thenumba
requirement is the best idea, as it can potentially be not backward compatible with other existing projects already dependent ondensratio
. The performance-improveddensratio.RuLSIF.set_compute_kernel_target
implementation returns anumpy.matrix
if any of the first two arguments is of thenumpy.matrix
type. Or it returns and expects anumpy.ndarray
, in case future commits replace the deprecatednumpy.matrix
with justnumpy.ndarray
.