Open ehuss opened 1 year ago
I think versioning can work if the versioning rules are clearly defined, and the format is documented. If I understand the problem correctly, right now there is only one format version, changing any metadata field breaks the version guarantees, because the version is always the same.
Tools and libraries that parse the metadata format can rely on the --format-version
flag to pin the specific version they support, without getting broken if the metadata changes in newer versions. As long as any changes go together with a version bump, those tools should remain compatible with the version they request.
I personally like how Stripe versions their API to provide backwards compatibility guarantees without having to expose version numbers directly in their resources: https://stripe.com/blog/api-versioning. The way that this would work is probably different for Cargo, and it might require an RFC.
These would be the steps I'd take to create a compelling versioning story:
cargo metadata --format-version 1.80.0
would give us that shape.cargo metadata
on Cargo 1.80.0 would serialize the most recent changes. When Cargo 1.81.0 comes along, if there has been no changes, cargo metadata
displays the metadata for version 1.80.0.The problem with using requested versions is trying to figure out how to map new concepts into the old version. Things like Cargo.toml
evolve independently of the JSON interface. This issue was triggered by https://github.com/oli-obk/cargo_metadata/issues/240 where the type of a field changed. For example, let's say there was a project with:
[profile.dev]
debug = "line-tables-only"
Format version 1 specified that debuginfo
was an integer. If cargo tried to keep compatibility with format version 1, what would it show in the debuginfo
field? No integer value would be correct. Should it generate an error? That's not much different from just breaking the tool (and possibly worse depending on how the tool parses, since most tools don't care about this field, though an explicit error could be clearer for tools that just fail to parse).
The problem with using requested versions is trying to figure out how to map new concepts into the old version.
I understand that. From my point of view, the lack of versioning is what's allowing changing types in the first place. You cannot have a conversation about data design when nothing is really enforced. Making the versioning guidelines explicit will help to set expectations about keeping backwards compatibility.
I worked at Docker for a while, and we were very cautious about breaking api compatibility, and sometimes failed to keep the data format compatible. However, the explicit guidelines forced those conversations during feature design, instead of later.
In your example, if debug changes from integer to string, you'd be forced to bump the format version. The best time to have the conversation about backwards compatibility was when the change was propose, but if there are no guidelines or anything that reminds people of those guarantees, the expectation is low, or non-existent.
you'd be forced to bump the format version.
This is the part I'm not quite following. Can you clarify what that means?
My interpretation is:
--format-version=1
to indicate which format of JSON it is expecting.--format-version=2
and generates an error for =1
.
=1
if the user is using the changes that modified the field type (like the new string-based values), but that still breaks tooling when a user opts into those new values.=1
), and thus the new release is not backwards compatible.
but if there are no guidelines or anything that reminds people of those guarantees, the expectation is low, or non-existent.
We do have guidelines that say cargo is intended to be backwards compatible except where specified, but there were no automated tests or tooling to enforce it in this case and nobody recognized that this type snuck into the serialized format or recognized that it was a breaking change until it was too late. We also have versioning built-in to cargo metadata
, but that also requires reviewers to recognize when a breaking change is made.
However, even if we did recognize it, I don't know what we would have done differently. None of the options seem particularly good to me. That is, they all seem to break tooling in one way or another.
The answer might be that the only option we have is to break backwards compatibility in some way. Then the question is, which approach causes the least breakage? Of the options I see, it seems like we stumbled on the course with the least amount of breakage (although we should have been more explicit about it in the docs, and proactive in updating the cargo_metadata
crate). I'm not saying it was the best course, though. I'm just concerned about forcing tools to break via version declarations when they wouldn't otherwise be broken (when they don't care about the changes).
oh, yeah, let me clarify. I'm not saying that a previous version should break if a new version makes changes, I'm proposing the opposite, to try to keep backwards compatibility as much as possible. I agree with you that this requires an extra level of awareness from maintainers/reviewers.
Going back to your example. The initial type for profile.debug
was an integer. Someone proposed to change it to strings. That'd allow to specify new debug options and be more clear about what they meant. In that case, a new version would be required because it's a breaking change. To keep backwards compatibility, you could check the format version and render either a string or a number depending on that version.
Running this command, cargo metadata --format-version=1
would render something like this:
{
"profiles": { "dev": { "debug": 1 } }
}
Running this command, cargo metadata --format-version=2
would render something like this:
{
"profiles": { "dev": { "debug": "line-tables-only" } }
}
New values in that option might not make sense in the old format. You could either decide to give them meaning, or have a fallback value that means "unknown/undefined".
I'm just concerned about forcing tools to break via version declarations when they wouldn't otherwise be broken (when they don't care about the changes).
This is a very valid concern. In my experience, being explicit by using version numbers leads to least surprises and better backwards guarantees.
So far, we have treated the JSON messages emitted by
--message-format=json
(docs) to be somewhat stable (with the caveats of adding new fields). However, that stability has been broken at times (see https://github.com/oli-obk/cargo_metadata/issues/240 as an example). We don't have a strategy for how such changes could be made. We should probably come up with some story and documentation for how we manage the compatibility.This issue is to collect feedback and ideas on how this should be handled and documented.
Note that this somewhat transitively applies to rustc's diagnostic structure. There should probably be a consistent story with both that and cargo's own messages, which may involve consulting with the compiler team.
My thoughts/ideas:
Versioning doesn't work particularly well since introducing a new version that tooling doesn't recognize would cause them to break (and in the debuginfo case, would cause them to unnecessarily to break in most cases since that change only affected the rare projects that would use the string-based types).
We could potentially indicate that types of fields can change between versions, but otherwise meanings of existing fields wouldn't change. But that could be a challenge for deserializers like
cargo_metadata
since the only choice to stay compatible would be to deserialize to something likeserde_json::Value
, losing all structure. Or those tools could just accept that type changes would break their tooling.We could potentially just document that this format is not stable at all, and shed responsibility. An option for tools like
cargo_metadata
is that they could just do semver-breaking changes periodically.Another option is to document that the format is stable except in the presence of new opt-in features (like the debuginfo string example). This could make it challenging or impossible to make broader structural changes.