storpipfugl / pykdtree

Fast kd-tree implementation in Python
GNU Lesser General Public License v3.0
206 stars 48 forks source link

Consider array for all Node structs and other possible optimizations #92

Open djhoese opened 1 year ago

djhoese commented 1 year ago

In #81, when talking about serializing the KDTree object I pointed out that the Tree/Node structure internal to pykdtree is structured as a series of struct objects along with an "ancillary" pidx arrray (size of points) and a bbox (size of dimensions * 2).

I've been contemplating whether it would be smarter to declare the tree nodes as a large array and fill it in as we build the tree. This has a couple advantages as well as makes it easier to do some other refactorings. The Tree/Node array could be structured in a depth first or breadth first manner depending on what is more performant. Given the way the algorithm works I think it would be smarter to structure this faster building rather than faster querying, but I'm not sure which is which.

If implemented, I see this changing a few things:

  1. Serialization should be much easier (see #81) since the tree/nodes are no longer custom C-level structs, but could just be a buffer of bytes. It would be large, but no larger than the existing total amount of memory.
  2. I think overall memory usage should be smaller since we don't need have pointers to left/right nodes. Leaf-ness (whether a node is a leaf or not) is just based on how many elements are left in the array.
  3. Tree building should be faster because we no longer have to do a malloc to create each Node. I would assume this should allow the compiler to optimize things even further and not wait for memory to be allocated.
  4. Querying might also get sped up, but since the point of the tree structure is to visit as few nodes as possible it wouldn't be the primary goal.
  5. This (I think) would make it easier to restructure the recursive functions into non-recursive/for loop set of functions. This should (again, I think) allow more optimizations from the compiler.

Thoughts? I don't have time to implement this, but I'd be curious what kind of difference it makes.