Add Eq, PartialEq, Serialize, Deserialize to Target

joshwlewis commented 2 months ago

Buildpacks (like heroku/buildpacks-go) may wish to store Target in layer metadata, so that caches may be invalidated when Target os, arch, distro, etc. changes. To store and retrieve Target data in toml, we need Deserialize and Serialize. To detect changes between two targets, PartialEq and Eq are also implemented.

Malax commented 2 months ago

If we want to implement Serialize and Deserialize, we probably should at least add some extra serde attributes for the optional field to avoid serialising them when they're None like we do for libcnb-data types. My gut feeling is that adding both implementations is a little weird, but I see the practical use for them.

What are your thoughts about the additional attributes @joshwlewis?

edmorley commented 2 months ago

I chose not to use Target directly in the layer metadata for Python, since: (a) leaking the internal libcnb types into the layer metadata means having to handle forwards-backwards compat for layer metadata migration if the libcnb type were ever to change (b) it felt more explicit to include the precise fields needed for invalidation into the layer metadata rather than putting everything in there by default. for example, some layers might not be distro or arch-specific

edmorley commented 2 months ago

Fwiw I'm fine with making Target serialisable if we think it's reasonable to use in some cases - I just wondered if there's a potential "what is the best practice here and should libcnb be opinionated about it or not?" discussion to be had? :-)

schneems commented 2 months ago

Short: I'm leaning towards not wanting Serialize and Deserialize implemented. I think Clone, Debug, Eq and PartialEq make sense as mechanical properties. My worry with allowing these to be serialized directly is that maintainers must be aware that changes to that struct can cause cache misses and users would get no compile errors or warnings and would have to check the changelog. We even recently changed the format of Target https://github.com/heroku/libcnb.rs/pull/821.

Long: Originally, I made a smaller alias of Target called TargetId, which I also made Serialize and Deserialize. After some discussion with Rune I ended up moving to the model that Ed used with Python. I support both of Ed's reasons why, but wanted to add another:

Ownership over the changed message i.e. "CPU architecture changed from X to Y" instead of "Target changed: X-Y-Z-Q to X2-Y2-Z2-Q2" is less nice.

I've also experimented a bit with auto-generating the "changed" warning messages based on the serialized key https://github.com/heroku/buildpacks-ruby/pull/203/files#diff-1cedf1bfbafd5ca2b85e8261de347a2318ab3ea87abe49eeff117fb3f380faa9R150-R154. I think that approach works best when the desired key/value is stored at the top level instead of nested within a struct.

Another benefit to storing the data needed directly at the top level is migrating between metadata versions is fairly easy https://github.com/schneems/magic_migrate. However, if you need to start supporting multiple sub-migrations of associated structs, the picture gets much murkier.

Those are all from the buildpack maintainer angle. From the libcnb maintainer angle, I don't feel like Target is fully "baked" and will likely see some iteration/movement in the future. It feels like an action at a distance to have modifications to this struct trigger an unknown amount of cache clearing in libcnb user's apps.

joshwlewis commented 2 months ago

we probably should at least add some extra serde attributes for the optional field to avoid serialising them when they're None like we do for libcnb-data types.

Seems fine. Will do!

My gut feeling is that adding both implementations is a little weird, but I see the practical use for them.

I'm not following what's weird about both implementations. Care to elaborate?

leaking the internal libcnb types into the layer metadata means having to handle forwards-backwards compat for layer metadata migration if the libcnb type were ever to change

This is a strong argument. There's a fair chance the data format changes in the future, which would potentially invalidate otherwise valid caches. In the go use case, I don't care a whole lot though, since it's all intermediate compilation artifacts which are invalidated frequently anyway (go patch version changes, n number of uses).

it felt more explicit to include the precise fields needed for invalidation into the layer metadata rather than putting everything in there by default. for example, some layers might not be distro or arch-specific

I see the value of being explicit here, but does every layer in every buildpack need to reimplement this logic?

Ownership over the changed message i.e. "CPU architecture changed from X to Y" instead of "Target changed: X-Y-Z-Q to X2-Y2-Z2-Q2" is less nice.

The go buildpack doesn't explain why the build cache expired. It's supposed to be optional and ephemeral, so not a big deal.

In short, I think this makes sense for Go, and possibly other buildpacks that need wholesale invalidation on a target change. Do we want to force buildpack authors to write this themselves? My opinion is that this is a pretty low-cost convenience in some scenarios, and totally optional - buildpack authors can still choose their own cache expiration.

joshwlewis commented 2 months ago

we probably should at least add some extra serde attributes for the optional field to avoid serialising them when they're None like we do for libcnb-data types.

It looks like we really only do that for Vec<_> types elsewhere in libcnb-data. As it is in this PR arch_variant (which is Option<String>) doesn't get serialized when None, as expected.

joshwlewis commented 2 months ago

Given the resistance I'm seeing, I'll close this. It's not a big deal to handle target invalidation manually for one buildpack.

schneems commented 2 months ago

I see the value of being explicit here, but does every layer in every buildpack need to reimplement this logic?

Brainstorming another path forward:

Introduce a smaller struct with a more stable interface for instance DistroVersionArch where that is Serialize and Deserialize and can easily be built from a Target. In this example, we promise that this struct has 3 properties and will they will not change. Then if it needs to change, we introduce another struct and deprecate that one so people get a warning or compile error. This buys us the ability to change Target and makes it a little easier for buildpack maintainers who want to invalidate when the OS changes with as little code as possible.

I picked those three values because they're what Ruby and Python both needed.

If you really want it to be invalidated on ANY change to Target, the struct is Debug so you could make your metadata take a String and then store the Debug string of the Target. I wouldn't personally recommend it, but hey...it's technically possible.

heroku / libcnb.rs

Add Eq, PartialEq, Serialize, Deserialize to Target #823