lanterndata / lantern

PostgreSQL vector database extension for building AI applications
https://lantern.dev
GNU Affero General Public License v3.0
790 stars 57 forks source link

Rewrote usearch's usearch_newnode_level function to use a session-glo… #172

Closed therealdarkknight closed 1 year ago

therealdarkknight commented 1 year ago

Fixes #112.

We rewrite the usearch_newnode_level function from usearch, which generates a random level for a vector that we wish to insert.

This not only fixes the seed resetting as described in the issue, but also is an optimization because we use a session-global rng that is seeded by postgres, instead of initializing and seeding our own on every insertion.

Notes:

Further todos/improvements:

ezra-varady commented 1 year ago

It looks like the prng header you're using was added to postgres relatively recently, so it breaks older builds. One potential option is to add preprocessor guards to the function to fall back to using C's builtin prng. However I think this solution is less than ideal.

I believe that the reason the level rng isn't behaving as expected is because the random number generator that is drawn from here is null initialized here. This is probably not an issue if you initialize the index once and make several calls, but because our calling pattern is 1 initialization and then a fixed number of calls to the prng, it's misbehaving. A solution to this bug might be altering the default initialization of this struct member to be seeded with the current time or something similar. This is probably also preferable because it keeps the functionality within the library and would correct the issue at internal call sites as well

Ngalstyan4 commented 1 year ago

@therealdarkknight, Good work! You can add a preprocessor guard, as @ezra-varady suggested (see example here) and call an appropriately scaled version of random() for older versions of postgres.

We actually want this behavior in lantern because we depend on the size of the node to be inserted and this way we can directly calculate the size of the new node fully in lantern