Open irq0 opened 2 years ago
Seems reasonable. Maybe a candidate for v0.23.0; alternatively, we'll push this for GA.
@irq0 how feasible is this for v0.23.0?
It needs design work to be certain. From the top of my head I'd say better not. While it would increase robustness and confidence in the IO path, to do this right we need failure injection testing. An implementation also needs to be careful not to cause a performance regression.
Related issues that make sense to co design: https://github.com/aquarist-labs/s3gw/issues/669 - store checksum there as well or only there https://github.com/aquarist-labs/s3gw/issues/481 - use this checksum for versions
Alright, lets reevaluate for v0.25.0. And lets add those two as tasks for this one.
Having end to end checksums is nice during development and even nicer when the system encounters broken hardware.
A possible design would be to checksum every 1M, 4k, $whatevermakessense. Section the backend file, add headers with crc info. Or use a second, sparse file mapping offsets to headers. Or sqlite.
Check checksums on every operation that reads the data from disk (get, copy, etc).
Related: https://aws.amazon.com/blogs/aws/new-additional-checksum-algorithms-for-amazon-s3/
Tasks