microsoft / ALEX

A library for building an in-memory, Adaptive Learned indEX
MIT License
656 stars 112 forks source link

Handle empty data nodes when finding last key #19

Closed geoffxy closed 2 years ago

geoffxy commented 2 years ago

This PR addresses an edge case in get_payload_last_no_greater_than() and find_last_no_greater_than() related to querying a key larger than the largest key in the index.

The get_payload_last_no_greater_than() method should return the record associated with the largest key in ALEX. But the method actually returns an invalid value. What appears to be happening is that the get_leaf() call returns an empty leaf. That leaf’s previous leaf is also empty. You actually have to traverse backwards through a few empty leaf nodes until you reach the last populated node, which stores the record with the largest key.

As discussed offline, an acceptable solution is to just traverse backwards to the first non-empty data node (if it exists).

I've also included a test case that reproduces the problem and currently fails on master.

ghost commented 2 years ago

CLA assistant check
Thank you for your submission, we really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.

:x: geoffxy sign now
You have signed the CLA already but the status is still pending? Let us recheck it.