If we have a failure while changing sstables from tmp to final during a compaction, this should normally be handled correctly by the Transactional interface.
However, as @inespot noticed if closing the writer also fails, so we do not fully abort all the new sstables, the in-progress compaction entry will still be removed despite the compaction not being fully rolled back.
If you're doing a leveled compaction where you may have multiple product sstables per ancestor, the lack of in-progress compaction log means you'll prefer to delete the ancestors than the product on startup when cleaning unused sstables. This, plus deleting all the tmp sstables, can mean data loss.
Instead if the compaction log is retained if the abort doesn't succeed, on startup the couple non-tmp products will be deleted, all tmp files deleted, and ancestor files retained
Also note that if the process just crashed mid-compaction, this is fine on first startup since we'll still have the record. However you could still hit the case where if you failed mid-startup and cleaned the record, the next startup would have problems.
If we have a failure while changing sstables from tmp to final during a compaction, this should normally be handled correctly by the Transactional interface.
However, as @inespot noticed if closing the writer also fails, so we do not fully abort all the new sstables, the in-progress compaction entry will still be removed despite the compaction not being fully rolled back.
If you're doing a leveled compaction where you may have multiple product sstables per ancestor, the lack of in-progress compaction log means you'll prefer to delete the ancestors than the product on startup when cleaning unused sstables. This, plus deleting all the tmp sstables, can mean data loss.
Instead if the compaction log is retained if the abort doesn't succeed, on startup the couple non-tmp products will be deleted, all tmp files deleted, and ancestor files retained
Also note that if the process just crashed mid-compaction, this is fine on first startup since we'll still have the record. However you could still hit the case where if you failed mid-startup and cleaned the record, the next startup would have problems.