sdd / kiddo

Kiddo
Apache License 2.0
95 stars 15 forks source link

Why Fixed library? #26

Closed vchugreev closed 1 year ago

vchugreev commented 1 year ago

Hello, Scott.

You have a very useful and cool library for ML, and I have been using it for a long time. It obviously lacked support for fixed-point types. And in the 2nd version you added it, it's very cool, but why Fixed? Why not Decimal? What are the advantages of the Fixed?

P.S. Sorry for my bad english, it's not my native language.

Sincerely, Valery

sdd commented 1 year ago

Hi @vchugreev! Thanks for the kind words. I went for Fixed mostly to be able to use 16-bit values to specify co-ordinates. The float capability supports f32 and f64, but for many use cases the ~8 significant figures of precision of f32 (and especially the ~15 significant figures of precision of f64) are more than is required. Especially if you are prioritising query speed - using 16-bit types for your positions allows many more of the k-d tree's interior nodes to fit in the CPU cache, potentially giving a performance increase (depending on many factors such as CPU architecture, etc), and also permitting serialized k-d trees to take up significantly less storage space.

I'd not come across https://docs.rs/rust_decimal/latest/rust_decimal/ before but it seems to take up 128 bits per value. This would have the opposite effect of using Fixed with 16 bits - resulting in fewer entries fitting in the CPU cache.

If you wanted to work with Decimal, you could potentially store your entries as structs in a Vec<> with the Decimal position stored within the struct, but still use f32 / f64 / Fixed axes in the k-d tree? As long as the entries are not all clustered together on any particular axis within 0.000000000001 of each other or something like that, you should be able to use the tree to find the nearest neighbours to your query but then retrieve the full precision Decimal value by querying your Vec<> with the returned index from the tree.

vchugreev commented 1 year ago

@sdd, thanks for clarifying! Really, your solution is optimal.