sdd / kiddo

Kiddo
Apache License 2.0
89 stars 14 forks source link

Why must Content be Zero and One? #33

Open wentasah opened 1 year ago

wentasah commented 1 year ago

From a quick look, it seems that this is because KdTree::size is T, which might not be necessary. I'd like to store a simple data structure as Content, and having this structure represent size makes no sense. Or is there anything else I don't see?

sdd commented 1 year ago

The intent was that this would nudge users towards storing their structs in a Vec rather than the tree, and storing indexes into that Vec in the KdTree itself. This keeps the size of the items down, helping to keep the leaf nodes smaller, which helps with fitting more into the CPU cache. Since the intent was to store index values, the max value returnable by size() would be the max value that this index could take, hence the return type being T.

On reflection, this can be a bit counter-intuitive. I'd consider changing the return type of size() to usize since this is really what you'd normally expect to get back from the size() method of containers in Rust generally.

wentasah commented 1 year ago

This makes sense.

Actually, what I really need is to get coordinates of the nearest point. It seems it should be possible to get them from the tree directly, but there is no API for that. Now, I need to store the points separately in a Vec and get the coordinates from there. Since the size of my point is 64 bits, storing the point in the tree would make no difference in size. If I can get coordinates from the tree, I could have content () and the tree would be even smaller.

sdd commented 1 year ago

It depends on how many points you have. Indexing into a Vec is a usize by default (sounds like you're on a 64 bit arch so 64 bits in this case) but i'm guessing that you're not storing > 2 billion points and so a u32 would do, which can then be cast back to a usize when accessing the Vec. This is the approach that I use.

Clearly though this is not as useful as simply having the ability to return the point when making the query. We could add a method called nearest_one_point or similar that returns the point rather than the content. That would be fairly straightforward. This could be more efficient and performant if T could also be (), which would be more work and potentially a breaking change.

I'm open to adding the nearest_one_point method for now, and then addressing the ability to have T as () in the upcoming major release I have planned. How does that sound for you?

wentasah commented 1 year ago

That sounds perfect. My current application is not performance critical so I don't need T to be () right now.

NeuralModder commented 3 months ago

I'm open to adding the nearest_one_point method for now, and then addressing the ability to have T as () in the upcoming major release I have planned. How does that sound for you?

Hello Scott.

Just to inform you of my situation: In a project I'm a part of, we are currently storing RGB color information in the Kdtree as Content, while also using that same color information as coordinates. This means that we're actually storing the color information twice. Letting T be () would therefor be useful to us. This is also not a performance critical application in any way, but it would be nice. I'm looking forward to the next major release.

Cheers, Youser