Determine story around JSON message compatibility

So far, we have treated the JSON messages emitted by --message-format=json (docs) to be somewhat stable (with the caveats of adding new fields). However, that stability has been broken at times (see https://github.com/oli-obk/cargo_metadata/issues/240 as an example). We don't have a strategy for how such changes could be made. We should probably come up with some story and documentation for how we manage the compatibility.

This issue is to collect feedback and ideas on how this should be handled and documented.

Note that this somewhat transitively applies to rustc's diagnostic structure. There should probably be a consistent story with both that and cargo's own messages, which may involve consulting with the compiler team.

My thoughts/ideas:

Versioning doesn't work particularly well since introducing a new version that tooling doesn't recognize would cause them to break (and in the debuginfo case, would cause them to unnecessarily to break in most cases since that change only affected the rare projects that would use the string-based types).

We could potentially indicate that types of fields can change between versions, but otherwise meanings of existing fields wouldn't change. But that could be a challenge for deserializers like cargo_metadata since the only choice to stay compatible would be to deserialize to something like serde_json::Value, losing all structure. Or those tools could just accept that type changes would break their tooling.

We could potentially just document that this format is not stable at all, and shed responsibility. An option for tools like cargo_metadata is that they could just do semver-breaking changes periodically.

Another option is to document that the format is stable except in the presence of new opt-in features (like the debuginfo string example). This could make it challenging or impossible to make broader structural changes.

I think versioning can work if the versioning rules are clearly defined, and the format is documented. If I understand the problem correctly, right now there is only one format version, changing any metadata field breaks the version guarantees, because the version is always the same.

Tools and libraries that parse the metadata format can rely on the --format-version flag to pin the specific version they support, without getting broken if the metadata changes in newer versions. As long as any changes go together with a version bump, those tools should remain compatible with the version they request.

I personally like how Stripe versions their API to provide backwards compatibility guarantees without having to expose version numbers directly in their resources: https://stripe.com/blog/api-versioning. The way that this would work is probably different for Cargo, and it might require an RFC.

These would be the steps I'd take to create a compelling versioning story:

Commit to a stable format for version 1. The only version currently available should be documented and frozen from any changes. Any stable version should probably not include any feature that has not been stabilized yet.
Find a mechanism for creating new versions. This could be something like what Stripe uses, data migrations between versions, or something more simple, like an enum with different versions and values. This might be the most challenging part to get right, and it might deserve an RFC.
Any changes to the format uses Cargo's version as version number to create a new representation of the metadata. For example, if Cargo 1.80 introduces a change in the shape of the metadata, running cargo metadata --format-version 1.80.0 would give us that shape.
If the metadata command doesn't specify any version, default to the most recent version updated. Following the previous example, cargo metadata on Cargo 1.80.0 would serialize the most recent changes. When Cargo 1.81.0 comes along, if there has been no changes, cargo metadata displays the metadata for version 1.80.0.
Any changes in the metadata format get added to the changelog, so integrators that depend on this format can be informed about those changes.

The problem with using requested versions is trying to figure out how to map new concepts into the old version. Things like Cargo.toml evolve independently of the JSON interface. This issue was triggered by https://github.com/oli-obk/cargo_metadata/issues/240 where the type of a field changed. For example, let's say there was a project with:

[profile.dev]
debug = "line-tables-only"

Format version 1 specified that debuginfo was an integer. If cargo tried to keep compatibility with format version 1, what would it show in the debuginfo field? No integer value would be correct. Should it generate an error? That's not much different from just breaking the tool (and possibly worse depending on how the tool parses, since most tools don't care about this field, though an explicit error could be clearer for tools that just fail to parse).

The problem with using requested versions is trying to figure out how to map new concepts into the old version.

I understand that. From my point of view, the lack of versioning is what's allowing changing types in the first place. You cannot have a conversation about data design when nothing is really enforced. Making the versioning guidelines explicit will help to set expectations about keeping backwards compatibility.

I worked at Docker for a while, and we were very cautious about breaking api compatibility, and sometimes failed to keep the data format compatible. However, the explicit guidelines forced those conversations during feature design, instead of later.

In your example, if debug changes from integer to string, you'd be forced to bump the format version. The best time to have the conversation about backwards compatibility was when the change was propose, but if there are no guidelines or anything that reminds people of those guarantees, the expectation is low, or non-existent.

you'd be forced to bump the format version.

This is the part I'm not quite following. Can you clarify what that means?

My interpretation is:

A tool is passing --format-version=1 to indicate which format of JSON it is expecting.
A new cargo release is made that changes the type of a field, and that now requires --format-version=2 and generates an error for =1.
- This can be mitigated a little by only generating an error with =1 if the user is using the changes that modified the field type (like the new string-based values), but that still breaks tooling when a user opts into those new values.
The tool will be broken when used with the new release (since it is still passing in =1), and thus the new release is not backwards compatible.
- It might be a slightly nicer error message, but for some use cases that would be worse because some tools weren't looking at that field or care about it (and thus would needlessly break them).

but if there are no guidelines or anything that reminds people of those guarantees, the expectation is low, or non-existent.

We do have guidelines that say cargo is intended to be backwards compatible except where specified, but there were no automated tests or tooling to enforce it in this case and nobody recognized that this type snuck into the serialized format or recognized that it was a breaking change until it was too late. We also have versioning built-in to cargo metadata, but that also requires reviewers to recognize when a breaking change is made.

However, even if we did recognize it, I don't know what we would have done differently. None of the options seem particularly good to me. That is, they all seem to break tooling in one way or another.

The answer might be that the only option we have is to break backwards compatibility in some way. Then the question is, which approach causes the least breakage? Of the options I see, it seems like we stumbled on the course with the least amount of breakage (although we should have been more explicit about it in the docs, and proactive in updating the cargo_metadata crate). I'm not saying it was the best course, though. I'm just concerned about forcing tools to break via version declarations when they wouldn't otherwise be broken (when they don't care about the changes).

oh, yeah, let me clarify. I'm not saying that a previous version should break if a new version makes changes, I'm proposing the opposite, to try to keep backwards compatibility as much as possible. I agree with you that this requires an extra level of awareness from maintainers/reviewers.

Going back to your example. The initial type for profile.debug was an integer. Someone proposed to change it to strings. That'd allow to specify new debug options and be more clear about what they meant. In that case, a new version would be required because it's a breaking change. To keep backwards compatibility, you could check the format version and render either a string or a number depending on that version.

Running this command, cargo metadata --format-version=1 would render something like this:

{
  "profiles": { "dev": { "debug": 1 } }
}

Running this command, cargo metadata --format-version=2 would render something like this:

{
  "profiles": { "dev": { "debug":  "line-tables-only" } }
}

New values in that option might not make sense in the old format. You could either decide to give them meaning, or have a fallback value that means "unknown/undefined".

I'm just concerned about forcing tools to break via version declarations when they wouldn't otherwise be broken (when they don't care about the changes).

This is a very valid concern. In my experience, being explicit by using version numbers leads to least surprises and better backwards guarantees.

rust-lang / cargo

Determine story around JSON message compatibility #12377