Parallelization in query* calls and update to KDTree

Hello, thanks for creating this package.

There have been updates in SciPy that allows for parallel processing in the various tree query calls used in this package that I have found to be quite beneficial. Additionally, SciPy seems to prefer KDTree over cKDTree going forward.

I have changed to KDTree accordingly and added workers parameter (Default to 1 as in SciPy) that can be used to speed up entropy estimations. Tests still pass although there aren't much improvements in speed there. However, using get_h on my sample data the difference is significant. Done with 18 workers on an i9-10900K.

Test	Single	Parallel (18 workers)
`test_get_h`	1.44 ms ± 59.7 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)	1.58 ms ± 8.87 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
`test_get_h_1d`	706 µs ± 5.58 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)	932 µs ± 3.6 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
`test_get_mi`	358 ms ± 22.8 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)	366 ms ± 23.7 ms per loopget_h(): 1.44 ms ± 59.7 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
`test_get_pmi`	811 ms ± 27.2 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)	810 ms ± 72.9 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
My sample data: float64 np.array with size (90000, 31)	1min 55s ± 4.6 s per loop (mean ± std. dev. of 7 runs, 1 loop each)	17.9 s ± 157 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

Hope this can be useful, and any feedback is welcome. Thanks!

paulbrodersen / entropy_estimators

Parallelization in query* calls and update to KDTree #8