pulumi / pulumi

Pulumi - Infrastructure as Code in any programming language πŸš€
https://www.pulumi.com
Apache License 2.0
21.22k stars 1.1k forks source link

Implement efficient checkpoint edits in the filestate backend #10068

Open t0yv0 opened 2 years ago

t0yv0 commented 2 years ago

Hello!

Issue details

Consider implementing efficient checkpoint edits in the filestate backend, improving on IO (and network bandwidth in the case of S3).

Currently Pulumi programs using the filestate backend serialize a series of state checkpoints using gocloud.dev func (b *Bucket) WriteAll:

WriteAll(s_1)
WriteAll(s_2)
...
WriteAll(s_N)

The suggestion is to exploit the fact that these state checkpoints are very similar, so s2 = patch(diff(s2, s1)) and we can transmit and record diff(s2, s1) instead of s2, saving on bandwidth and moving some work to the read phase to reconstruct state from the diffs.

This same improvement is being solved for the httpstate backend in https://github.com/pulumi/pulumi/issues/3930 using JSON PATCH API.

While it may be more difficult to do this for the filestate backend since all the logic needs to reside on the client side, it should be theoretically possible. Both plain file system and e.g. S3 Buckets certainly support "append" writes and even ranged reads/writes. If operating on ranges of an binary object cannot be worked out, multi-object encoding schemes can be devised instead.

If implementing this, care should be taken to preserve the ability to read state in the old format.

Affected area/feature

Frassle commented 1 year ago

Related https://github.com/pulumi/pulumi/issues/6074