Epic: Layer checksums - Githubissues

neondatabase / neon

Neon: Serverless Postgres. We separated storage and compute to offer autoscaling, code-like database branching, and scale to zero.

https://neon.tech

Apache License 2.0

14.58k stars 423 forks source link

Epic: Layer checksums #2683

Open koivunej opened 1 year ago

koivunej commented 1 year ago

Follow-up to #2456, which started collecting index file metadata (just the size). Related to #987 but this isn't about page checksums but file verification with hashing.

file checksums (at least, as a placeholder to do the format changes once) to later fill with the values, compatible with what we have in AWS S3

Originally posted by @SomeoneToIgnore in https://github.com/neondatabase/neon/issues/2456#issuecomment-1248384538

My ideas so far:

collect $hash during startup for all files and post-for created files (l0, compacted)
use $hash with upload
- add handling for upload failing for hash mismatch
investigate if $hash can be read during download (or, an AsyncWrite wrapper) from remote metadata
- if can be read, LocalFS will need to start storing these
refactor LayerFileMetadata matching
- currently this is done in three places, should de-duplicate
- de-duplication needed even before solving Option problem in IndexFileMetadata

S3 supports SHA2-256 which we already have as a dependency through sha2.

andresrsanchez commented 1 year ago

Hey @koivunej are you with this task or can i try to tackle it?

Thanks!

koivunej commented 1 year ago

@andresrsanchez feel free to work on this. If there are any questions, feel free to ping me!

shanyp commented 1 year ago

@koivunej still relevant ?

koivunej commented 1 year ago

This is very much relevant, but not easy.

LizardWizzard commented 1 year ago

Worth checking if s3 sdk now supports new generation of checksum passing API. This allows to cheaply verify checksums without reading file twice. See https://docs.aws.amazon.com/AmazonS3/latest/userguide/checking-object-integrity.html the Using trailing checksums paragraph

In case this is ready we can avoid storing checksums in index part and rely on sdk/s3 interaction. Though there a concern that this is not supported by other s3-like storages. Storing checksums in index file is more portable.

LizardWizzard commented 1 year ago

Some more context: https://github.com/awslabs/aws-sdk-rust/discussions/539#discussioncomment-2784356

shanyp commented 1 year ago

@LizardWizzard once this is implemented by the sdk, could we just toggle the check for checksum validation?

koivunej commented 1 year ago

https://docs.rs/aws-sdk-s3/latest/aws_sdk_s3/operation/put_object/builders/struct.PutObjectFluentBuilder.html#method.checksum_sha256 exists already in the most recent (which may or may not yet use).

Looking at the docs for get_object it's not entirely clear for me how to request the hashes.

What we'd need in pageserver is then:

record the sha2-256 during creating of DeltaLayer and ImageLayer
- this cannot be done completly as I'd prefer in the current format which has a summary in beginning so it's not written in single pass
- the presence of ephmeral files give us a way to have integrity errors before we rewrite the L0 files
- unsure if the single pass + ephmeral file integrity is trivially ensurable, might need non-file hash constructions like merkle trees
offer this "file hash" when uploading
ask for this "file hash" when downloading a layer
one-shot generation of missing file hashes but somehow doing it in the sense that s3 wins, as in, we wouldn't refuse a local file which has been changed or modified on local disk but prefer the s3 version

sha2-256 is the best we can do compared to what s3 supports. Not much has changed from my PR description.

Continued work on from step 3 should probably wait for #4745.

LizardWizzard commented 1 year ago

record the sha2-256 during creating of DeltaLayer and ImageLayer

offer this "file hash" when uploading

ask for this "file hash" when downloading a layer

The idea here is a slightly different. Sdk will calculate checksum during upload and pass it in http trailer. So this is transparent from calling code point of view.

Next question is whether we should validate checksums when we read files regardless of s3 stuff and how to combine the two if needed

LizardWizzard commented 1 year ago

once this is implemented by the sdk, could we just toggle the check for checksum validation?

As see it the answer is yes

koivunej commented 8 months ago

Most recent idea to work with our current way of writing out layer files (write contents first, then seek back and fill in summary): crc32, which is also supported by s3.

I've been protesting adding of crc32, but aside from some custom merkle-tree-ish (blake3 style) contraptions, we cannot get a "file hash" (think of "sha256sum file") while we write the file in two steps. crc32 would support this via crc32_combine. crc32 could work as "a stepping stone" to verify what we wrote and it eventually gives this sha-2 256 hash. "Stepping stone" could be either:

we re-read the written file while feeding a sha-2 256 digest while checking the file gives the same crc32, then use sha-2 256 going forward (with s3)
we configure the crc32 to the file upload request, and rely on s3 to deny if the upload produces different crc32