lmcinnes / pynndescent

A Python nearest neighbor descent for approximate nearest neighbors
BSD 2-Clause "Simplified" License
899 stars 105 forks source link

Utilizing NumPy APIs more #192

Closed maldil closed 2 years ago

maldil commented 2 years ago

Thank you very much for your excellent work in pynndescent.

I am a researcher studying the best practices of growing data science codes and am new to this repository. According to our findings, migration of loop-based computations is a common evolution recommended practice since it increases performance and code quality. I saw numerous areas in this repository where it could make better use of NumPy APIs and element inefficient loops. If you agree, I generated this PR so that the maintainers may review it and incorporate the changes.

Thanks again for the nice work.

lmcinnes commented 2 years ago

All of these cases are within numba jitted functions. Numba is a library that compiles python code to LLVM, and is esepcially good at handling numeric code, including operating on numpy arrays. As it turns out, because this code is within the scope of numba compilation the loop constructs are (when compiled) faster than the numpy vectorized versions. That means that in these cases the conversion to numpy is actually going to make the code slower rather than faster. Thanks for the input, but unfortunately in this case the vectorization is not required.

maldil commented 2 years ago

Thank you for the consideration.