If you want to implement this feature, comment to let us know (we'll work with you on design, scheduling, etc.)
Issue details
Consider implementing efficient checkpoint edits in the filestate backend, improving on IO (and network bandwidth in the case of S3).
Currently Pulumi programs using the filestate backend serialize a series of state checkpoints using gocloud.devfunc (b *Bucket) WriteAll:
WriteAll(s_1)
WriteAll(s_2)
...
WriteAll(s_N)
The suggestion is to exploit the fact that these state checkpoints are very similar, so s2 = patch(diff(s2, s1)) and we can transmit and record diff(s2, s1) instead of s2, saving on bandwidth and moving some work to the read phase to reconstruct state from the diffs.
While it may be more difficult to do this for the filestate backend since all the logic needs to reside on the client side, it should be theoretically possible. Both plain file system and e.g. S3 Buckets certainly support "append" writes and even ranged reads/writes. If operating on ranges of an binary object cannot be worked out, multi-object encoding schemes can be devised instead.
If implementing this, care should be taken to preserve the ability to read state in the old format.
Hello!
Issue details
Consider implementing efficient checkpoint edits in the filestate backend, improving on IO (and network bandwidth in the case of S3).
Currently Pulumi programs using the filestate backend serialize a series of state checkpoints using
gocloud.dev
func (b *Bucket) WriteAll:The suggestion is to exploit the fact that these state checkpoints are very similar, so
s2 = patch(diff(s2, s1))
and we can transmit and recorddiff(s2, s1)
instead ofs2
, saving on bandwidth and moving some work to the read phase to reconstruct state from the diffs.This same improvement is being solved for the
httpstate
backend in https://github.com/pulumi/pulumi/issues/3930 using JSON PATCH API.While it may be more difficult to do this for the
filestate
backend since all the logic needs to reside on the client side, it should be theoretically possible. Both plain file system and e.g. S3 Buckets certainly support "append" writes and even ranged reads/writes. If operating on ranges of an binary object cannot be worked out, multi-object encoding schemes can be devised instead.If implementing this, care should be taken to preserve the ability to read state in the old format.
Affected area/feature