Closed Lee-Jun-Hyuk-37 closed 1 year ago
Thanks for your pull request. The main change seems to be the definition of a new Numba-accelerated function (fast_euclidean_dist
) for computing the Euclidean distance between two points, which you find to be faster than SciPy's spatial.distance.euclidean
. Could you give me a rough estimate of the speedups (i.e., by what factor) you're seeing with the new function? If the Numba-accelerated version is much faster than SciPy, then we could refactor the code to use the Numba version, and there would be no need to define new functions to use the Numba-accelerated distance function.
Thanks for comments! The requested Experiments result is as follows. It was done by data_length, maxt, and embedding dimension.
You can check the full experiment codes in here.
Depending on the local PC, the measured time value will vary, but I think the performance of the Numba-acceleraded function can be sufficiently confirmed with the shape of above graphs.
If you accept this result and want to refactor to the numba function instead of the scipy function, give me a comment, I'll work on it and update new commits.
Thanks for measuring things! Numba does speed things up considerably.
Yes, I'd be happy to accept the changes. However, as I mentioned previously, rather than add new functions with the same functionality, think it's better to alter existing code. In principle, I think you just need to rewrite utils.dist()
to use Numba.
If you want to work on the changes yourself, please make sure to add support for both cityblock and Chebyshev metrics as well.
Thank you very much!
I will reflect what you said and work on it as soon as possible!
At first, I wanted to put a Numba jit decorator right on the utils.dist()
function, but unfortunately string type parameter is not yet supported by Numba. So, I defined each dist function with Numba jit decorator in utils.py, and make existing utils.dist
function to use these functions. Therefore, to obtain the distances, utils.dist
function can be used in the same way as before.
As requested, I have implemented it to work for cityblock, euclidean, and chebyshev distances, and here I have verified that fixed function is much faster than the previous function with producing the same results. One unfortunate thing is that the existing scipy function supported more distances besides these 3 distances, but in the new function, only the above 3 distances can be used.
In addition I also added numba to requirement.txt.
Thanks, this looks good to me.
Dear maintainer,
Thank you so much for creating a very useful project.
While using it, I felt that calculating mle took a very long time due to the large amount of calculations. So I used the numba library to drastically reduce the time to calculate mle. This has a greater effect as the length of the data increases. You can verify it by experimenting with the code in the example. However, this is still limited to using euclidean distance as a metric.
I appreciate your work, and I hope my pull request helps for you.
Sincerely, Lee Jun Hyuk