storpipfugl / pykdtree

Fast kd-tree implementation in Python
GNU Lesser General Public License v3.0
206 stars 48 forks source link

Improve insert_point #90

Open akaszynski opened 1 year ago

akaszynski commented 1 year ago

The old code in insert_point_double function uses a for loop that iterates through the elements from k-1 to 1 and then compares the cur_dist with closest_dist[i - 1]. If the condition is true, it performs an assignment operation, which essentially shifts elements in the array. This means it will take a longer time when more elements satisfy this condition (i.e., when the list is less sorted). Moreover, the loop has multiple if-else branches, adding extra overhead.

In this PR insert_point_double function first assigns the cur_dist and pidx values to the last element in the closest_dist and closest_idx arrays. It then utilizes a single for loop that iterates from k-1 to 1, comparing closest_dist[j - 1] with closest_dist[j]. If the condition is met, it performs a "bubbling" operation: swapping the elements at the current position j and the position before it j-1, using temporary variables tmp_dist and tmp_idx. This means that it only modifies the elements in the list if necessary, reducing the number of assignment operations required.

In summary, the new code is more efficient because it reduces the number of assignment operations by using a "bubbling" approach instead of always shifting elements. Additionally, it simplifies the loop structure by avoiding multiple branches, resulting in improved performance. The time complexity is generally the same for both codes when considering the worst-case scenario (O(k)), but the new code performs better on average due to its more efficient implementation. The space complexity is constant for both implementations, but the new code utilizes fewer variables, making it slightly more efficient in terms of space usage.

Extras

This PR modifies several functions to become inline and makes several parameters const.

djhoese commented 1 year ago

Awesome job! Do you have any timing comparisons between the old version and this new one?

Otherwise, the biggest issue is that that .c file is actually generated from a .mako template. Would it be possible for you to move your changes from the .c file to the .c.mako file here:

https://github.com/storpipfugl/pykdtree/blob/master/pykdtree/_kdtree_core.c.mako

akaszynski commented 1 year ago

I'd recommend avoiding building using a mako file and instead relying on Cython doing the work. You can do this by removing pykdtree/render_template.py and modifying your setup.py to instead use 'pykdtree/kdtree.pyx'. Also, pyproject.toml is now the way to go for building wheels to ensure build requirements are installed. See https://cython.readthedocs.io/en/latest/src/userguide/source_files_and_compilation.html

djhoese commented 1 year ago

I have a lot of experience with cython and a decent amount with pyproject.toml. I think C++ templating is probably the way this project should go in the long run to keep the internal functions as small and low-level as possible. I've found that when putting nearly pure-C/C++ functions in Cython that they end up getting some "cruft" that you wouldn't otherwise get if you wrote it "by hand". The last time I played with this was probably 8 or so years ago though so I'm sure it is overall better now...but I also don't want to rewrite that entire C template file in Cython.

And while pyproject.toml would be a good idea for defining the build environment I think with the Cython extensions we would at least need some minimal version of a setup.py.

akaszynski commented 1 year ago

I have a lot of experience with cython and a decent amount with pyproject.toml. I think C++ templating is probably the way this project should go in the long run to keep the internal functions as small and low-level as possible. I've found that when putting nearly pure-C/C++ functions in Cython that they end up getting some "cruft" that you wouldn't otherwise get if you wrote it "by hand". The last time I played with this was probably 8 or so years ago though so I'm sure it is overall better now...but I also don't want to rewrite that entire C template file in Cython.

Good point regarding the "cruft." I've found it as well and I'll add it to the .mako template.

djhoese commented 1 year ago

Whoa whoa whoa. Sorry. I didn't realize when I wrote my last comment that you rewrote the entire build system already. We really need to keep these PRs separate. I'm still not sold on completely rewriting this entire package just because. I know the modern standards have changed, but besides auto-running the .mako template stuff in the setup.py I'm a little scared to completely rewrite this. I'd be more comfortable with the C++ template if anything, but I'm also just a maintainer of this package not the primary author. These types of changes should really be discussed by all maintainers (in their own well-described PR).

Could you please return this PR back to the changes you made before, but only in the mako file? Optionally you could run the render script to generate the .c file. Then, you could move all of your other build system changes to a new PR.

akaszynski commented 1 year ago

I'll totally revert. I see now how it's advantageous to use the mako file for templating.

I do feel that it's best to use pyproject and a setup.py with Cython rather than shipping the autogenerated source from Cython, but I fully agree that it should be in a separate PR.

djhoese commented 1 year ago

Yes the distributed Cython C code is a problem. It used to be a suggestion by some in the past (maybe even Cython docs themselves) to avoid including a dependency on Cython during installation. That's changed in the last 5 or so years (pyproject.toml sandbox build environments help) as it now hinders downstream build systems (ex. conda-forge) from automatically getting the newest/best version of the .c file that Cython can produce and that might include better compatibility with other CPython versions.

Thanks for understanding.

djhoese commented 10 months ago

What are the chances you could revert your changes so we can get this merged? I'm working on #98 which includes some of the stuff we talked about (removing the Cython generated C file) but keeps the mako template and generated .c file from that.

djhoese commented 6 months ago

Why the close?

akaszynski commented 6 months ago

Why the close?

Sorry, the PR was stale. If you're still interested I'll work on it this month if I can fit it into my schedule.

djhoese commented 6 months ago

Thanks. I think my main concern was that you just went a little too far on the refactor. If you revert a couple commits as discussed above (from what I remember) then it should be good to go.