Storage should not read all TSM blocks for operations first and last

Storage should not have to read all TSM blocks when window |> first and window |> last are pushed down. For example, if a series spans 5 TSM blocks, in the case of window |> first, only the first block should need to be completely read from disk. After that, we should only be reading the index entries for the 4 remaining blocks. Specifically we should be checking the time range in the index to see whether that block starts a new window. If not, there's no reason to read the block and we can check the next index entry.

For series that span multiple blocks this would represent a huge win in performance.

Some things to note:

This will require working at the TSM level. Because the on-disk layout of data is modeled after a log structured merge tree, it is possible we'll be reading from several TSM files. It is also possible that there will be blocks with overlapping time ranges across these TSM files. Care needs to be taken that data is read in the right order and that blocks are merged if necessary. It's unclear how much of this we'll need to be concerned with vs how much has already been abstracted away.

influxdata / influxdb

Storage should not read all TSM blocks for operations first and last #18415