manu-mannattil / nolitsa

A Python module implementing some standard algorithms used in nonlinear time series analysis
BSD 3-Clause "New" or "Revised" License
172 stars 46 forks source link

This incredibly speeds up calculating mle with euclidean distance #5

Closed Lee-Jun-Hyuk-37 closed 1 year ago

Lee-Jun-Hyuk-37 commented 1 year ago

Dear maintainer,

Thank you so much for creating a very useful project.

While using it, I felt that calculating mle took a very long time due to the large amount of calculations. So I used the numba library to drastically reduce the time to calculate mle. This has a greater effect as the length of the data increases. You can verify it by experimenting with the code in the example. However, this is still limited to using euclidean distance as a metric.

I appreciate your work, and I hope my pull request helps for you.

Sincerely, Lee Jun Hyuk

manu-mannattil commented 1 year ago

Thanks for your pull request. The main change seems to be the definition of a new Numba-accelerated function (fast_euclidean_dist) for computing the Euclidean distance between two points, which you find to be faster than SciPy's spatial.distance.euclidean. Could you give me a rough estimate of the speedups (i.e., by what factor) you're seeing with the new function? If the Numba-accelerated version is much faster than SciPy, then we could refactor the code to use the Numba version, and there would be no need to define new functions to use the Numba-accelerated distance function.

Lee-Jun-Hyuk-37 commented 1 year ago

Thanks for comments! The requested Experiments result is as follows. It was done by data_length, maxt, and embedding dimension.

Experiment by data length Experiment by maxt Experiment by embedding_dim

You can check the full experiment codes in here.

Depending on the local PC, the measured time value will vary, but I think the performance of the Numba-acceleraded function can be sufficiently confirmed with the shape of above graphs.

If you accept this result and want to refactor to the numba function instead of the scipy function, give me a comment, I'll work on it and update new commits.

manu-mannattil commented 1 year ago

Thanks for measuring things! Numba does speed things up considerably.

Yes, I'd be happy to accept the changes. However, as I mentioned previously, rather than add new functions with the same functionality, think it's better to alter existing code. In principle, I think you just need to rewrite utils.dist() to use Numba.

If you want to work on the changes yourself, please make sure to add support for both cityblock and Chebyshev metrics as well.

Lee-Jun-Hyuk-37 commented 1 year ago

Thank you very much!

I will reflect what you said and work on it as soon as possible!

Lee-Jun-Hyuk-37 commented 1 year ago

At first, I wanted to put a Numba jit decorator right on the utils.dist() function, but unfortunately string type parameter is not yet supported by Numba. So, I defined each dist function with Numba jit decorator in utils.py, and make existing utils.dist function to use these functions. Therefore, to obtain the distances, utils.dist function can be used in the same way as before.

As requested, I have implemented it to work for cityblock, euclidean, and chebyshev distances, and here I have verified that fixed function is much faster than the previous function with producing the same results. One unfortunate thing is that the existing scipy function supported more distances besides these 3 distances, but in the new function, only the above 3 distances can be used.

In addition I also added numba to requirement.txt.

manu-mannattil commented 1 year ago

Thanks, this looks good to me.