zclconf / go-cty

A type system for dynamic values in Go applications
MIT License
338 stars 70 forks source link

msgpack: truncate long string prefix refinements #167

Closed kmoe closed 10 months ago

kmoe commented 10 months ago

Serialising an unknown value with a string prefix refinement with a sufficiently long prefix will result in an error on deserialisation, since cty limits the size of the refinements blob to 1024 bytes.

This PR is an implementation of the approach described in https://github.com/hashicorp/terraform/issues/33464#issuecomment-1625670378 for getting around this. Of course we would have to truncate the value while deserialising as well, otherwise the value does not round-trip correctly.

@apparentlymart I'm assuming that we don't want to impose a limit on prefix length when constructing the refinement in the first place (nor does the RefinementBuilder seem to allow for this, as far as I can tell). Silently truncating the prefix during serialisation seems like it suffices to preserve "approximations of refinements" under msgpack as documented, but if we really believe there is no proper use case for string prefix refinements greater than 256 bytes, can we let the consumer know more explicitly?

apparentlymart commented 10 months ago

Thanks for working on this!

The string prefix refinement is primarily intended for short prefixes like https:// or ami- or similar, so I find it a reasonable compromise to truncate longer prefixes during serialization to limit the amount of overhead refinements can create -- not all consumers of values will do anything with the refinements anyway, but every user of the msgpack serialization must still pay the cost of storing/transmitting them.

I considered handling this during serialization to be a pragmatic compromise because it means that applications which don't serialize their values can still benefit from the longer prefixes in the less common cases where they are relevant -- these strings were presumably already in RAM anyway, so that storage cost has already been paid -- and then (as you said) this becomes a reasonable "approximation" for applications that do use serialized values to transmit refinements, as Terraform does with its providers.

It is true that this is a situation where values will not round-trip losslessly. While that's non-ideal I think it's a reasonable compromise. It does mean, I suppose, that the TestRoundTrip test model will not work for this particular scenario, and so this would either need a new test specialized for this situation or a special rule in the TestRoundTrip test loop which considers an unknown string to have "round-tripped" correctly if the expected prefix has been preserved. The former (a new test) is probably the easier to implement, because then we can hand-write the expected value rather than having to describe a general rule for deriving it from arbitrary input.

kmoe commented 10 months ago

Thanks @apparentlymart, I've created a new test for this case demonstrating the truncation behaviour.

apparentlymart commented 10 months ago

This looks great to me, so I'm going to merge it now. Thanks!