Closed westonpace closed 6 days ago
We had discussed earlier some similar index changes proposed here:
https://github.com/lancedb/lancedb/issues/1666
It looks like this is a good step in that direction by adding the index_config
/ index_details
field 👍
Attention: Patch coverage is 66.05505%
with 37 lines
in your changes missing coverage. Please review.
Project coverage is 77.90%. Comparing base (
f257489
) to head (e481fd4
). Report is 1 commits behind head on main.
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
This addresses a specific problem. When a dataset had a scalar index on a string column we would perform I/O during the planning phase on every query that contained a filter. This added considerably latency (especially against S3) to query times.
We now cache that lookup.
It also starts to tackle a more central problem as well. Right now we our manifest stores very little information about indices (pretty much just the UUID). Any further information must be obtained by loading the index. This PR introduces the concept of "index details" which is a spot that an index can put index-specific (e.g. specific to btree or specific to bitmap) information that can be accessed during planning (by just looking at the manifest). At the moment this concept is still fairly bare bones but I think, as scalar indices become more sophisticated, this information can be useful.
If we decide we don't want it then I can pull it out as well and dial this PR back to just the caching component.