This is a bit light on the test coverage, but I expect there is going to be some big refactoring coming to segment state and some of these other pieces that track parquet files in the system. However, I wanted to get this in so that we can keep things moving along. Big changes here:
Create a persister module in the write_buffer
Check the size of the buffer (all open segments) every 10s and predict its size in 5 minutes based on growth rate
If the projected growth rate is over the configured limit, either close segments that haven't received writes in a minute, or persist the largest tables (oldest 90% of their data)
Added functions to table buffer to split a table based on 90% older timestamp data and 10% newer timestamp data, to persist the old and keep the new in memory
When persisting, write the information in the WAL
When replaying from the WAL, clear out the buffer of the persisted data
Updated the object store path for persisted parquet files in a segment to have a file number since we can now have multiple parquet files per segment
This is a bit light on the test coverage, but I expect there is going to be some big refactoring coming to segment state and some of these other pieces that track parquet files in the system. However, I wanted to get this in so that we can keep things moving along. Big changes here: