Closed joshwlewis closed 2 months ago
If we want to implement Serialize
and Deserialize
, we probably should at least add some extra serde attributes for the optional field to avoid serialising them when they're None
like we do for libcnb-data types. My gut feeling is that adding both implementations is a little weird, but I see the practical use for them.
What are your thoughts about the additional attributes @joshwlewis?
I chose not to use Target
directly in the layer metadata for Python, since:
(a) leaking the internal libcnb types into the layer metadata means having to handle forwards-backwards compat for layer metadata migration if the libcnb type were ever to change
(b) it felt more explicit to include the precise fields needed for invalidation into the layer metadata rather than putting everything in there by default. for example, some layers might not be distro or arch-specific
Fwiw I'm fine with making Target
serialisable if we think it's reasonable to use in some cases - I just wondered if there's a potential "what is the best practice here and should libcnb be opinionated about it or not?" discussion to be had? :-)
Short: I'm leaning towards not wanting Serialize and Deserialize implemented. I think Clone, Debug, Eq and PartialEq make sense as mechanical properties. My worry with allowing these to be serialized directly is that maintainers must be aware that changes to that struct can cause cache misses and users would get no compile errors or warnings and would have to check the changelog. We even recently changed the format of Target https://github.com/heroku/libcnb.rs/pull/821.
Long:
Originally, I made a smaller alias of Target
called TargetId
, which I also made Serialize and Deserialize. After some discussion with Rune I ended up moving to the model that Ed used with Python. I support both of Ed's reasons why, but wanted to add another:
I've also experimented a bit with auto-generating the "changed" warning messages based on the serialized key https://github.com/heroku/buildpacks-ruby/pull/203/files#diff-1cedf1bfbafd5ca2b85e8261de347a2318ab3ea87abe49eeff117fb3f380faa9R150-R154. I think that approach works best when the desired key/value is stored at the top level instead of nested within a struct.
Another benefit to storing the data needed directly at the top level is migrating between metadata versions is fairly easy https://github.com/schneems/magic_migrate. However, if you need to start supporting multiple sub-migrations of associated structs, the picture gets much murkier.
Those are all from the buildpack maintainer angle. From the libcnb maintainer angle, I don't feel like Target
is fully "baked" and will likely see some iteration/movement in the future. It feels like an action at a distance to have modifications to this struct trigger an unknown amount of cache clearing in libcnb user's apps.
we probably should at least add some extra serde attributes for the optional field to avoid serialising them when they're None like we do for libcnb-data types.
Seems fine. Will do!
My gut feeling is that adding both implementations is a little weird, but I see the practical use for them.
I'm not following what's weird about both implementations. Care to elaborate?
leaking the internal libcnb types into the layer metadata means having to handle forwards-backwards compat for layer metadata migration if the libcnb type were ever to change
This is a strong argument. There's a fair chance the data format changes in the future, which would potentially invalidate otherwise valid caches. In the go use case, I don't care a whole lot though, since it's all intermediate compilation artifacts which are invalidated frequently anyway (go patch version changes, n number of uses).
it felt more explicit to include the precise fields needed for invalidation into the layer metadata rather than putting everything in there by default. for example, some layers might not be distro or arch-specific
I see the value of being explicit here, but does every layer in every buildpack need to reimplement this logic?
Ownership over the changed message i.e. "CPU architecture changed from X to Y" instead of "Target changed: X-Y-Z-Q to X2-Y2-Z2-Q2" is less nice.
The go buildpack doesn't explain why the build cache expired. It's supposed to be optional and ephemeral, so not a big deal.
In short, I think this makes sense for Go, and possibly other buildpacks that need wholesale invalidation on a target change. Do we want to force buildpack authors to write this themselves? My opinion is that this is a pretty low-cost convenience in some scenarios, and totally optional - buildpack authors can still choose their own cache expiration.
we probably should at least add some extra serde attributes for the optional field to avoid serialising them when they're None like we do for libcnb-data types.
It looks like we really only do that for Vec<_>
types elsewhere in libcnb-data
. As it is in this PR arch_variant
(which is Option<String>
) doesn't get serialized when None
, as expected.
Given the resistance I'm seeing, I'll close this. It's not a big deal to handle target invalidation manually for one buildpack.
I see the value of being explicit here, but does every layer in every buildpack need to reimplement this logic?
Brainstorming another path forward:
DistroVersionArch
where that is Serialize and Deserialize and can easily be built from a Target
. In this example, we promise that this struct has 3 properties and will they will not change. Then if it needs to change, we introduce another struct and deprecate that one so people get a warning or compile error. This buys us the ability to change Target
and makes it a little easier for buildpack maintainers who want to invalidate when the OS changes with as little code as possible. I picked those three values because they're what Ruby and Python both needed.
Target
, the struct is Debug so you could make your metadata take a String
and then store the Debug string of the Target
. I wouldn't personally recommend it, but hey...it's technically possible.
Buildpacks (like heroku/buildpacks-go) may wish to store
Target
in layer metadata, so that caches may be invalidated whenTarget
os
,arch
,distro
, etc. changes. To store and retrieveTarget
data in toml, we needDeserialize
andSerialize
. To detect changes between two targets,PartialEq
andEq
are also implemented.