khonsulabs / bonsaidb

A developer-friendly document database that grows with you, written in Rust
https://bonsaidb.io/
Apache License 2.0
1.01k stars 37 forks source link

Tuple Key implementation doesn't order properly #240

Closed ecton closed 2 years ago

ecton commented 2 years ago

While working on #238, I was doing some more intensive testing on tuple keys/composite keys. I tried to use encode_composite_key directly to manually create a prefix range for a composite key, and I didn't receive the results I was looking for.

In the end, I realized that my original implementation is flawed. It doesn't sort correctly. Take these cases:

Tuple Encoded
("a", 2_u8) 0197 02
("aa", 1_u8) 0297 9701
("b", 2_u8) 0198 02

The left column is ordered as they should be, but if you notice, the bytes aren't ordered the same way. The current implementation uses variable integer encoding to encode the length at the start of a variable field. For some reason, my brain back then thought that this worked correctly, and I even thought I unit tested this properly.

I discovered part of the problem of bad testing in 7f780a7de8e096b5973509c58990ecee41298a3a, but it wasn't until I tried to implement recursive file listing using a prefix key that I realized the issue here.

I've come up with a new encoding format already, but I wanted to document this more thoroughly in an issue rather than solely burying the details in a commit message. I will be providing a way to continue using the old encoding through a wrapper type, allowing users who have collections with primary keys that are tuples to maintain compatibility.