Currently, compaction creates one large batch per compacted data file, which is sub-optimal as it reduces hints efficiency and can amplify corruption cases as it creates a long chain of CRC.
We should batch writes following the same batching scheme as in the pre-compacted file: that is, each set of non-compacted locations should be contained in a batch, creating the same "grouping" as before.
Example.
Given a file with the following Batches/Locations:
B1(L2,L3,L4) B5(L6,L7) B8(L9)
If L4, L6 and L9 are deleted and compacted, the result should be:
B1(L2,L3) B4(L7)
Currently, compaction creates one large batch per compacted data file, which is sub-optimal as it reduces hints efficiency and can amplify corruption cases as it creates a long chain of CRC.
We should batch writes following the same batching scheme as in the pre-compacted file: that is, each set of non-compacted locations should be contained in a batch, creating the same "grouping" as before.
Example.
Given a file with the following Batches/Locations: B1(L2,L3,L4) B5(L6,L7) B8(L9) If L4, L6 and L9 are deleted and compacted, the result should be: B1(L2,L3) B4(L7)