Is your feature request related to a problem? Please describe.
If we do a lot of updates on small date ranges we will eventually fragment the data a lot and it will become unreadable.
I think currently if we update a single row within a table data key we will end up writing 3 table data keys.
Describe the solution you'd like
In some cases fragmenting the data is unavoidable if we want to have reasonable performance on small updates. E.g. if we use update with a range after the existing one. So for such cases we can have lib.update(defragment=True) which will pay the extra price at update time but it won't fragment table data and keep read performance.
Also I think we can decrease the fragmentation without any extra cost in cases where we split up existing table data keys.
What we do now is:
1. Read all table data keys which interstect the updated date range
2. Filter out the first table data key to only contain index before updated date range and write it back
3. Filter out the last table data key to only contian index after updated date range and write it back
4. Write a completely new segment with the updated date range
When interstacting table data keys are <3 we end up increasing the number of total segments. We can instead without extra cost write the combined segment from steps 2,3 and 4 as one table data key (and maybe split it up if it's > 100k rows)
Describe alternatives you've considered
Occasional defragmentation with lib.write
Is your feature request related to a problem? Please describe. If we do a lot of updates on small date ranges we will eventually fragment the data a lot and it will become unreadable.
I think currently if we update a single row within a table data key we will end up writing 3 table data keys.
Describe the solution you'd like In some cases fragmenting the data is unavoidable if we want to have reasonable performance on small updates. E.g. if we use update with a range after the existing one. So for such cases we can have
lib.update(defragment=True)
which will pay the extra price at update time but it won't fragment table data and keep read performance.Also I think we can decrease the fragmentation without any extra cost in cases where we split up existing table data keys. What we do now is:
When interstacting table data keys are <3 we end up increasing the number of total segments. We can instead without extra cost write the combined segment from steps 2,3 and 4 as one table data key (and maybe split it up if it's > 100k rows)
Describe alternatives you've considered Occasional defragmentation with
lib.write