solana-labs / solana

Web-Scale Blockchain for fast, secure, scalable, decentralized apps and marketplaces.
https://solanalabs.com
Apache License 2.0
13.36k stars 4.36k forks source link

Solana Releases Should Follow Strict Semantic Versioning #32503

Open samuelvanderwaal opened 1 year ago

samuelvanderwaal commented 1 year ago

Problem

Solana frequently ships versions of libraries used by developers that are not compatible with with Semantic Versioning which leads to numerous dependency issues in downstream libraries. One of the main causes of the dependency issues is the Rust package manager, Cargo, expects Semantic Versioning to be followed and defaults to pulling in new versions based on this dependency scheme. This leads to downstream packages breaking or being unable to upgrade if Solana does not follow the scheme.

Solana v1.16, in particular, has caused significant issues because it upgrades to a breaking version of borsh (0.9.3 --> 0.10.3). I assume there is some kind of social component in not wanting to upgrade major versions of Solana (v2.0, v3.0, etc.) but I would argue the lost developer adoption and development pain outweigh this nicety.

A few examples of issues: https://github.com/coral-xyz/anchor/issues/2511 https://solana.stackexchange.com/questions/6991/how-to-fix-error-the-trait-borshserialize-is-not-implemented-for-pubkey

A few examples of Twitter threads discussing dependency issues: https://twitter.com/redacted_noah/status/1679907169523597325?s=20 https://twitter.com/HeyAndyS/status/1678069298516901893?s=20 https://twitter.com/blockiosaurus/status/1675289236818350082?s=20 https://twitter.com/ngundotra/status/1679928743593140230?s=20 https://twitter.com/redacted_noah/status/1675889143040147457?s=20

Proposed Solution

Implement a process to strictly follow semantic versioning in the solana monorepo and program library releases. If Solana and other major significant ecosystem players, such as Metaplex and Anchor, all adhere to semantic versioning, this should greatly reduce the dependency issues the space is rife with.

In addition, it may make sense for Solana developers to work in concert with e.g. Anchor maintainers and Metaplex protocol team to coordinate upgrades when new versions hit mainnet so the ecosystem stays in sync. There's a number of nested and cyclic dependencies that make it difficult for a single team to upgrade (packages relying on Token Metadata which in turn relies on spl-token etc.).

ilya-bobyr commented 1 year ago

Personal opinion, not necessarily reflective of the rest of the Solana Labs team.

The breakage at hand is indeed unpleasant. But Semantic Versioning is not necessarily a solution to all the dependency problems. I think this piece is a nice summary: https://gist.github.com/jashkenas/cbd2b088e20279ae2c8e There are mode detailed discussions that explain this problem. I could probably find links if someone is interested. Either way, I have not yet seen a system that really covers all cases, and is reasonably simple. At the same time, Rust packages are restricted to SemVer as cargo and rustc interpret it.

So let's pretend we did bump the whole SDK to 2.0. It was actually something I have personally suggested while discussing the borsh update to another co-worker.

I think, such a version change, it would have just created a different set of issues. And they are not necessarily better. Most crates express their dependency on the Solana as solana-abc = 1.14.x, meaning that solana-abc 1.16 is considered a valid replacement. So, if you have two crates in your dependency graph, one saying solana-abc = 1.14 and the other one saying solana-abc = 1.16, cargo will include only one instance of solana-abc, the most recent one - 1.16. But if we bump Solana SDK to 2.0, cargo will now include two instances of the solana-abc crate, both 1.14 and 2.0. rustc would treat types coming from these creates as distinct and incompatible.

Issues that talk about "not implemented BorshDeserialize" are examples of the case when just the borsh dependency got duplicated. One part of the code is expecting BorshDeserialize-0.9 while the other part providing BorshDeserialize-0.10 (or vise versa) [1]. I think it would be much worse if this had happened to all types in all Solana crates.

I am personally still trying to figure out how this should have been done, if at all possible, in a completely transparent way. One suggestion was to provide support for both borsh-0.9 and borsh-0.10. This is already implemented, but not merged yet. Looks promising. Though, I am not 100% sure there are absolutely no problems with this approach.

[1] To be fair, I put a little blame on rustc here. The error message is confusing. It says "BorshDeserialize is not implemented". While, in fact, it is implemented, just not the right version.

mschneider commented 1 year ago

i did complain many times about this issue in the last months, given there's a big hackathon about to start (august/september), this should be solved urgently.

people who try to build for the first time rn will simply get stuck with cargo compile errors. many developers that try solana often have limited rust experience and spending 8+ hours to fix dependencies to get a tutorial to work, doesn't really inspire confidence to start a larger project or even deploy to mainnet. it's the equivalent to a bug in the checkout feature, simply the worst kind of friction.

samuelvanderwaal commented 1 year ago

Personal opinion, not necessarily reflective of the rest of the Solana Labs team.

The breakage at hand is indeed unpleasant. But Semantic Versioning is not necessarily a solution to all the dependency problems. I think this piece is a nice summary: https://gist.github.com/jashkenas/cbd2b088e20279ae2c8e There are mode detailed discussions that explain this problem. I could probably find links if someone is interested. Either way, I have not yet seen a system that really covers all cases, and is reasonably simple. At the same time, Rust packages are restricted to SemVer as cargo and rustc interpret it.

It's a fair point that Semantic Versioning, nor any solution, will cover all cases, but that's not the goal. The goal is to improve the current situation to the point where we are not driving away new developers due to high costs sorting out dependency issues. I have been turned away from doing side projects because of these issues, and I do this for a living, so I have a lot better chance of resolving the issues than someone brand new would.

Semantic Versioning should significantly improve the situation from what it is now. However, if you have a better system to suggest, that's fine too. Consistency and communication are probably what are most important. What seems to happen though, is people don't like Semantic Versioning so then they don't follow any system and that leads to the kinds of issues we're seeing today.

So let's pretend we did bump the whole SDK to 2.0. It was actually something I have personally suggested while discussing the borsh update to another co-worker.

I think, such a version change, it would have just created a different set of issues. And they are not necessarily better. Most crates express their dependency on the Solana as solana-abc = 1.14.x, meaning that solana-abc 1.16 is considered a valid replacement. So, if you have two crates in your dependency graph, one saying solana-abc = 1.14 and the other one saying solana-abc = 1.16, cargo will include only one instance of solana-abc, the most recent one - 1.16. But if we bump Solana SDK to 2.0, cargo will now include two instances of the solana-abc crate, both 1.14 and 2.0. rustc would treat types coming from these creates as distinct and incompatible.

I don't follow this example. Cargo won't automatically include a major version if you use the default caret ^ dependency. It will look for anything from 1.14.x up to anything <2.0.0, so it won't pull in the major version automatically. If a dependency uses the major version Cargo will inform you that it can't find a dependency that meets the range you specified and you'll know you need to upgrade to 2.0.0 or use an older version of the dependency.

Now what Solana could do, is it could define its own system, say, x.0.0 releases are for runtime changes, 0.x.0 releases are breaking changes to the libs and 0.0.x for everything else. If you use that system consistently it would probably work and help fix some of the issues, but all it's doing is shifting SemVer schema to the right while following the same idea and now we all have to fight against the Cargo versioning defaults and against developer expectations since most of us are used to SemVer.

febo commented 1 year ago

Semantic Versioning should significantly improve the situation from what it is now. However, if you have a better system to suggest, that's fine too. Consistency and communication are probably what are most important. What seems to happen though, is people don't like Semantic Versioning so then they don't follow any system and that leads to the kinds of issues we're seeing today.

Totally agree with this, any system is better than no system. As a minimum, it sets the expectations and rules when breaking changes can occur. It does not need to be perfect, just consistent.

ilya-bobyr commented 1 year ago

[...] The goal is to improve the current situation to the point where we are not driving away new developers due to high costs sorting out dependency issues. I have been turned away from doing side projects because of these issues, and I do this for a living, so I have a lot better chance of resolving the issues than someone brand new would.

Absolutely. I think we are all trying to make the situation better.

Semantic Versioning should significantly improve the situation from what it is now. However, if you have a better system to suggest, that's fine too. Consistency and communication are probably what are most important.

Note, that Solana "Backward Compatibility Policy" says that Solana is supposed to follow Semantic Versioning. So, if one says "Solana should adopt Semantic Versioning" - this is already the case.

I do think there is a different approach that could work better. Fundamentally, cargo and rustc both enforce Semantic Versioning to a degree. So any approach would have to include these constraints. @joncinque is working on a revised version of the "Backward Compatibility Policy", that would reflect our findings from this incident.

What seems to happen though, is people don't like Semantic Versioning so then they don't follow any system and that leads to the kinds of issues we're seeing today.

It seems you assume that should we have increased the SDK version to 2.0, rather than 1.16, then the issue would not have happened. I do not believe this is the case. My understanding is that it would have caused way more issues, as I was trying to show in the part of my message that you quoted below.

It looks to me, that our understanding of how cargo and rustc behave with regard to Semantic Versioning rules are different. I suggest we clarify this first, and then the question of following or not following the Semantic Versioning rules for borsh upgrade would become clearer.

Note that I do think that we did fail the "minor version should be backward compatible" assumption. It is bad, and we are now looking for a better solution. @joncinque is working on a fix. But you can also see that the fix is far from trivial.


So let's pretend we did bump the whole SDK to 2.0. It was actually something I have personally suggested while discussing the borsh update to another co-worker. I think, such a version change, it would have just created a different set of issues. And they are not necessarily better. Most crates express their dependency on the Solana as solana-abc = 1.14.x, meaning that solana-abc 1.16 is considered a valid replacement. So, if you have two crates in your dependency graph, one saying solana-abc = 1.14 and the other one saying solana-abc = 1.16, cargo will include only one instance of solana-abc, the most recent one - 1.16. But if we bump Solana SDK to 2.0, cargo will now include two instances of the solana-abc crate, both 1.14 and 2.0. rustc would treat types coming from these creates as distinct and incompatible.

I don't follow this example. Cargo won't automatically include a major version if you use the default caret ^ dependency. It will look for anything from 1.14.x up to anything <2.0.0, so it won't pull in the major version automatically. If a dependency uses the major version Cargo will inform you that it can't find a dependency that meets the range you specified and you'll know you need to upgrade to 2.0.0 or use an older version of the dependency.

We are going to release a new version one way or another. And people are going to run cargo add solana-sdk. Which would take the latest version of solana-sdk from crates.io, be it 1.16 or 2.0. People will also run cargo add <popular-create> for other popular creates, such as anchor. And these crates will also depend on solana-sdk. What is important, is that these other crates will have an older version of solana-sdk in their dependency list. 1.14 today, something else in the future. This will always happen. And there is no way to avoid this discrepancy.

As you are saying, the default dependency constraints are "it is OK to bump minor, but major must match".

What we need to understand is what would cargo (and rustc) do, when major version is increased. When both solana-sdk-1.14 and solana-sdk-2.0 are present in the dependency list in the same application.

We can look at borsh, as 0.9 and 0.10 are considered incompatible. It is the same as upgrading from 1.14 to 2.0, from cargo and rustc standpoint [^1].

If two incompatible versions of a dependency are present in the dependency list, cargo will not fail. Instead, it will include both [^2]. (This is what npm and yarn also do by default, by the way.) From https://doc.rust-lang.org/cargo/reference/resolver.html#dependency-resolution:

If multiple packages have a common dependency with semver-incompatible versions, then Cargo will allow this, but will build two separate copies of the dependency.

The same page says, that this approach may lead to errors, if types are used as part of the crate public API.

So, it only works for crates used in the implementation part of the code. There is even an RFC 1977 to make this explicit and add additional safety checks. But it is not implemented at the moment.

This is exactly the problem people observed with borsh.

solana-sdk provides a lot of types that are used in the public APIs of the crates that use it. If we bump solana-sdk to 2.0 we are going to see this breakage for every type used in the public API of every other library that depends on solana-sdk and has not upgraded from 1.14 just yet.

Also, going through a major version bump for a fundamental dependency can be a very labor intense process. This page gives a nice overflow of the past experiences: https://github.com/dtolnay/semver-trick#coordinated-upgrades Going through that because of a borsh upgrade is not something we wanted to do.

We had the following options then:

  1. Do not upgrade borsh.
  2. Upgrade, but make sure we allow and support both 0.9 and 0.10 in the dependency graph. Not just either, but also both at the same time.
  3. Upgrade borsh with some breakage.

I do not know the exact reasoning for not going with option 1. But, I think, it is not really a sustainable long term solution anyway.

We considered option 2 and realized it was non-trivial. Now that we are actually implementing it, I would say, we actually underestimated the amount of work.

We decided option 3 is good enough, as borsh is only used in the implementation of the Pubkey serialization. We underestimated the breakage this will cause. As we implicitly upgraded all the borsh derivation clauses in the user apps. This is on us, and we are now making an effort to move to option 2.

[^1]: For version that start with 0. meaning of "major" and "minor" is defined differently. SemVer just says that 0. versions provide no API stability at all. cargo [deviates here](https://doc.rust-lang.org/cargo/reference/specifying- dependencies.html#specifying-dependencies-from-cratesio):

> This compatibility convention is different from SemVer in the way it treats versions before 1.0.0. While SemVer says there is no compatibility before 1.0.0, Cargo considers 0.x.y to be compatible with 0.x.z, where y ≥ z and x > 0.

[^2]: Now that we established that solana-sdk bump to 2.0 is the same as borsh bump to 0.10, we can actually try it out with borsh. borsh-cli lists ^0.9.3 as the borsh dependency. So it is easy to reproduce the problem:

    ❯ cargo init borsh-add-dependency/
         Created binary (application) package

    ❯ cd borsh-add-dependency/

    ❯ cargo add borsh
        Updating crates.io index
          Adding borsh v0.10.3 to dependencies.
                 Features:
                 + std
                 - bytes
                 - const-generics
                 - rc
        Updating crates.io index

    ❯ cargo add borsh-cli
        Updating crates.io index
          Adding borsh-cli v0.1.3 to dependencies.
        Updating crates.io index

    ❯ cargo tree
    borsh-add-dependency v0.1.0 (/home/ilya/works/tests/borsh-add-dependency)
    ├── borsh v0.10.3
    [...]
    └── borsh-cli v0.1.3
        [...]
        ├── borsh v0.9.3
        [...]
No errors, and `cargo tree` shows that we have both `0.9` and `0.10` in use.

Now what Solana could do, is it could define its own system, say, x.0.0 releases are for runtime changes, 0.x.0 releases are breaking changes to the libs and 0.0.x for everything else. If you use that system consistently it would probably work and help fix some of the issues, but all it's doing is shifting SemVer schema to the right while following the same idea and now we all have to fight against the Cargo versioning defaults and against developer expectations since most of us are used to SemVer.

I do not think it is an option, as both cargo and rustc do follow SemVer rules. One thing we may consider is: what does it mean for a minor version to be compatible? The article I linked in my first response, in my mind, is spot on, when it is talking about an oversimplification that one may assume, based on the SimVer guarantees:

SemVer is a false promise that appeals to many developers — the promise of pain-free, don't-have-to-think-about-it, updates to dependencies. But it simply isn't true. Node doesn't follow SemVer, Rails doesn't do it, Python doesn't do it, Ruby doesn't do it, jQuery doesn't (really) do it, even npm doesn't follow SemVer.

I think, in practice, Solana SDK will have to stay at version 1 forever. And we are going to make breaking changes in minor releases. Not abrutpty and suddenly, as the current one. But slowly, first deprecating certain APIs with a deprecated attribute. And providing new versions as alternatives, allowing all dependencies time to upgrade, before we actually do a backward incompatible change.

ilya-bobyr commented 1 year ago

Semantic Versioning should significantly improve the situation from what it is now. However, if you have a better system to suggest, that's fine too. Consistency and communication are probably what are most important. What seems to happen though, is people don't like Semantic Versioning so then they don't follow any system and that leads to the kinds of issues we're seeing today.

Totally agree with this, any system is better than no system. As a minimum, it sets the expectations and rules when breaking changes can occur. It does not need to be perfect, just consistent.

It sounds as if we currently do not have any system in place. Which is not true: https://docs.solana.com/developing/backwards-compatibility

We underestimated the scale of the breakage. So, in my mind, another important finding is not that we do not have a policy, but rather that it is not verified automatically. And manual verification does not always work.

We already run tests for a few downstream projects as part of our CI. We should add more projects to this list. In particular, anchor should be there.

As well as a simple example application. So that new users are always working with something that is constantly tested.

We still have not decided the priority and the ownership, but it is something we are discussing internally.

acheroncrypto commented 1 year ago

Dependency issues should be the last thing people need to think about but it has been the biggest problem recently.

SemVer

But Semantic Versioning is not necessarily a solution to all the dependency problems. I think this piece is a nice summary: https://gist.github.com/jashkenas/cbd2b088e20279ae2c8e

The main idea of the post you've linked is that SemVer is good for machines but not for humans, summarized by the author:

But Semantic Versioning (henceforth, SemVer), as specified at http://semver.org/, changes this to prioritize a mechanistic understanding of a codebase over a human one. Any "breaking" change to the software must be accompanied with a new major version number. It's alright for robots, but bad for us.

The author then goes to suggest a list of subjective reasons of why some version bumps are more imported than others due to some arbitrary reasons such as the % of users who are affected by the change(as if that can be known accurately). A breaking change in a public API is a breaking change in a public API.

While I agree that SemVer is not perfect, I think there is a fundamental misundertanding of when and why to use SemVer. SemVer is not about how the authors feel about the change, it's for consumers of the library to know which version to use without breaking their existing code.

SDK

It seems you assume that should we have increased the SDK version to 2.0, rather than 1.16, then the issue would not have happened. I do not believe this is the case. My understanding is that it would have caused way more issues, as I was trying to show in the part of my message that you quoted below.

On the day of the 1.16.0 release, pretty much every library that I can think of in the Solana ecosystem including Anchor, Metaplex, Pyth... broke. Why? Because everyone depends on solana-program crate and when it bumps from 1.14 to 1.16, that's a SemVer compatible bump and it causes cargo to pick the latest version on new installations.

Given this approach broke the builds of the entire ecosystem and therefore caused everyone in the ecosystem some kind of compilation issue, I'm curious about the how the other approach would cause "way more issues".

It looks to me, that our understanding of how cargo and rustc behave with regard to Semantic Versioning rules are different. I suggest we clarify this first, and then the question of following or not following the Semantic Versioning rules for borsh upgrade would become clearer.

I'll compare 1.16.0 vs 2.0.0(correct me if I'm wrong):

1.16.0

2.0.0

I think, in practice, Solana SDK will have to stay at version 1 forever. And we are going to make breaking changes in minor releases.

I appreciate the honesty, and removing statements such as

Solana software releases follow semantic versioning, more details below.

from the Backwards Compatibility Policy in https://github.com/solana-labs/solana/pull/32655 is a good step to avoid SemVer related confusion.

For ecosystem crates, this means that we'd have to either use a max version or ~ when specifying Solana crate versions to let cargo know that it should never pull SemVer compatible versions, which brings us to exactly where we've started.

samuelvanderwaal commented 1 year ago

Semantic Versioning should significantly improve the situation from what it is now. However, if you have a better system to suggest, that's fine too. Consistency and communication are probably what are most important. What seems to happen though, is people don't like Semantic Versioning so then they don't follow any system and that leads to the kinds of issues we're seeing today.

Totally agree with this, any system is better than no system. As a minimum, it sets the expectations and rules when breaking changes can occur. It does not need to be perfect, just consistent.

It sounds as if we currently do not have any system in place. Which is not true: https://docs.solana.com/developing/backwards-compatibility

Software for a MINOR version release will be compatible across all software on the same MAJOR version.

You're not even following your own system linked as proven by 1.14 --> 1.16. That's the crux of the nature of the complaint. Your system is effectively SemVer but you consistently don't follow it.

ilya-bobyr commented 1 year ago

I am not trying to justify the breakage that happened with the 1.16 release. My main point is: it is not an option for us to mark a new release as 2.0. As it would have broken all new setups for a very long time.

We are working to fix 1.16 to be compatible with 1.14. Fix has been merged, and backported to 1.16. So, I assume, we are going to release 1.16.7 with the fix some time soon.

The main idea of the post you've linked is that SemVer is good for machines but not for humans [...]

But Semantic Versioning (henceforth, SemVer), as specified at http://semver.org, changes this to prioritize a mechanistic understanding of a codebase over a human one. Any "breaking" change to the software must be accompanied with a new major version number. It's alright for robots, but bad for us.

The author then goes to suggest a list of subjective reasons of why some version bumps are more imported than others due to some arbitrary reasons such as the % of users who are affected by the change (as if that can be known accurately). A breaking change in a public API is a breaking change in a public API.

Maybe I should not have included the link. I was hoping to save some time, but instead it is turning into a separate discussion.

What I had in mind is that there are certain complexities that do happen in practice, that semantic versioning may not able to address. The percentage of the affected users is a reasonable criterion to consider, if any option you choose will create problems for someone.

XKCD has a comic about the same problem: https://xkcd.com/1172/

SDK

It seems you assume that should we have increased the SDK version to 2.0, rather than 1.16, then the issue would not have happened. I do not believe this is the case. My understanding is that it would have caused way more issues, as I was trying to show in the part of my message that you quoted below.

On the day of the 1.16.0 release, pretty much every library that I can think of in the Solana ecosystem including Anchor, Metaplex, Pyth... broke. Why? Because everyone depends on solana-program crate and when it bumps from 1.14 to 1.16, that's a SemVer compatible bump and it causes cargo to pick the latest version on new installations.

Yes, as I have written above, we do want 1.14 and 1.16 to be compatible. At least mostly. We did not anticipate this level of breakage.

Given this approach broke the builds of the entire ecosystem and therefore caused everyone in the ecosystem some kind of compilation issue, I'm curious about the how the other approach would cause "way more issues".

There is some confusing here. What do you mean by "this approach"?

An upgrade to 2.0 would have broken every new setup. Until every library upgrades. And we would not be able to easily fix it. Quite unlike the fix to 1.16 that is about to be released.

It looks to me, that our understanding of how cargo and rustc behave with regard to Semantic Versioning rules are different. I suggest we clarify this first, and then the question of following or not following the Semantic Versioning rules for borsh upgrade would become clearer.

I'll compare 1.16.0 vs 2.0.0(correct me if I'm wrong):

1.16.0

2.0.0

  • New installs of existing libraries don't break because cargo will not automatically pull the latest Solana version.

  • cargo update will not pull in the latest version for existing libraries which is exactly the reason why SemVer is useful.

  • This might cause issues if, as you've said, people add cargo add <solana-crate> and then tries to add an older project who depend on <solana-crate> ^1. It is not even guaranteed that this would cause any problems meanwhile the other approach 100% causes problems. And the solution for this problem would just be to match the version with <solana-crate> = "*" which is much better than giving people a growing list of dependencies that they need to downgrade in order to start building their programs.

When you are saying, "It is not even guaranteed that this would cause any problems" - this contradicts the example I've provided. I have shared a list of commands that you can follow to see it breaking.

You are right that cargo update would not upgrade across major version by default, but new setups will be 100% broken. And if you think about the path we would have to follow to fix it - it would be a very hard one. But again, we do not want to break anybody. Upgrade to 2.0 is guaranteed to be broken, by design of SemVer and cargo/rustc treatment of the compatibility rules. Upgrade to 1.16 was supposed to be mostly backward compatible. We underestimated the breakage and are working to fix it.

As a side note, I do not think that asking people to provide explicit version numbers is a very good approach. It would be great if we can avoid that.

[...]

from the Backwards Compatibility Policy in #32655 is a good step to avoid SemVer related confusion.

For ecosystem crates, this means that we'd have to either use a max version or ~ when specifying Solana crate versions to let cargo know that it should never pull SemVer compatible versions, which brings us to exactly where we've started.

Not sure if I follow. Could you unpack this? Maybe explain a specific scenario you have in mind step by step?

The idea is that everyone will use default version constraints, by default. Why are you saying "it should never pull SemVer compatible versions"?

We do want crates that depend on the SDK to be able to use newer versions of the SDK. If crates will start specifying explicit versions, cargo will treat them as incompatible, and tings will break right there. cargo makes the decision on the compatibility of a dependency not based on the specific version number, but based on the way the dependency version is specified. In other words, if two libraries would specify solana-sdk = "=1.16.6" and solana-sdk = "=1.16.7" respectively, cargo will include two copies of the solana-sdk and rustc will treat them as incompatible.

I also do not understand what do you mean by "which brings us to exactly where we've started".

ilya-bobyr commented 1 year ago

Semantic Versioning should significantly improve the situation from what it is now. However, if you have a better system to suggest, that's fine too. Consistency and communication are probably what are most important. What seems to happen though, is people don't like Semantic Versioning so then they don't follow any system and that leads to the kinds of issues we're seeing today.

Totally agree with this, any system is better than no system. As a minimum, it sets the expectations and rules when breaking changes can occur. It does not need to be perfect, just consistent.

It sounds as if we currently do not have any system in place. Which is not true: docs.solana.com/developing/backwards-compatibility

Software for a MINOR version release will be compatible across all software on the same MAJOR version.

You're not even following your own system linked as proven by 1.14 --> 1.16. That's the crux of the nature of the complaint. Your system is effectively SemVer but you consistently don't follow it.

I would like to disagree here. While I do not have statistics on the frequency of breakages, it is confusing to say that a breakage somehow means that we are not following the compatibility rules. Should we not be following these rules, why would we be fixing the 1.16 breakage? We are fixing it exactly because we are trying to make 1.14 and 1.16 compatible.

It does not mean that we can not improve our testing process. To reduce the chance of a breakage.

acheroncrypto commented 1 year ago

TL;DR

Doing the same thing and expecting different results.


This is a long post, I've tried to reply to your comments as best as I could and this is probably going to be my last comment on the issue because it doesn't look like it will get anywhere.

The main idea of the post you've linked is that SemVer is good for machines but not for humans [...]

But Semantic Versioning (henceforth, SemVer), as specified at http://semver.org, changes this to prioritize a mechanistic understanding of a codebase over a human one. Any "breaking" change to the software must be accompanied with a new major version number. It's alright for robots, but bad for us.

The author then goes to suggest a list of subjective reasons of why some version bumps are more imported than others due to some arbitrary reasons such as the % of users who are affected by the change (as if that can be known accurately). A breaking change in a public API is a breaking change in a public API.

Maybe I should not have included the link. I was hoping to save some time, but instead it is turning into a separate discussion.

What I had in mind is that there are certain complexities that do happen in practice, that semantic versioning may not able to address.

I definitely agree with you that SemVer may not be able to address everything, but I also have no doubt in my mind that if solana-program actually properly followed SemVer, dependency issues would stop being the biggest issue(for most people) in the ecosystem.

The percentage of the affected users is a reasonable criterion to consider, if any option you choose will create problems for someone.

What you are saying is hard to disagree with, but it's not the point. The point is following SemVer is an objective process that doesn't require certain people to guess what percentage of the users will get affected by this breaking change. If you are confident in your ability to estimate such numbers, I can only applaud your skills but history shows that hasn't been the case for Solana.

XKCD has a comic about the same problem: https://xkcd.com/1172/

Again, SemVer is not perfect and I'm trying to understand your viewpoint but I'm having a hard time seeing a logical argument.

SDK

It seems you assume that should we have increased the SDK version to 2.0, rather than 1.16, then the issue would not have happened. I do not believe this is the case. My understanding is that it would have caused way more issues, as I was trying to show in the part of my message that you quoted below.

On the day of the 1.16.0 release, pretty much every library that I can think of in the Solana ecosystem including Anchor, Metaplex, Pyth... broke. Why? Because everyone depends on solana-program crate and when it bumps from 1.14 to 1.16, that's a SemVer compatible bump and it causes cargo to pick the latest version on new installations.

Yes, as I have written above, we do want 1.14 and 1.16 to be compatible. At least mostly. We did not anticipate this level of breakage.

This is the point about guessing numbers, there is no need to anticipate or guess.

Given this approach broke the builds of the entire ecosystem and therefore caused everyone in the ecosystem some kind of compilation issue, I'm curious about the how the other approach would cause "way more issues".

There is some confusing here. What do you mean by "this approach"?

"this approach" is what Solana has been doing for a long time, which is to include breaking changes randomly in a SemVer compatible version(not only with minor releases but even with patch releases). It is proven by years of data to be unreliable yet it is still being defended.

An upgrade to 2.0 would have broken every new setup. Until every library upgrades. And we would not be able to easily fix it. Quite unlike the fix to 1.16 that is about to be released.

No it would NOT have broken every new setup, e.g. anchor init would not be broken. Solana making breaking changes and not properly versioning them is what breaks the ecosystem crates. I'll give examples later in the post.

It looks to me, that our understanding of how cargo and rustc behave with regard to Semantic Versioning rules are different. I suggest we clarify this first, and then the question of following or not following the Semantic Versioning rules for borsh upgrade would become clearer.

I'll compare 1.16.0 vs 2.0.0(correct me if I'm wrong):

1.16.0

2.0.0

  • New installs of existing libraries don't break because cargo will not automatically pull the latest Solana version.
  • cargo update will not pull in the latest version for existing libraries which is exactly the reason why SemVer is useful.
  • This might cause issues if, as you've said, people add cargo add <solana-crate> and then tries to add an older project who depend on <solana-crate> ^1. It is not even guaranteed that this would cause any problems meanwhile the other approach 100% causes problems. And the solution for this problem would just be to match the version with <solana-crate> = "*" which is much better than giving people a growing list of dependencies that they need to downgrade in order to start building their programs.

When you are saying, "It is not even guaranteed that this would cause any problems" - this contradicts the example I've provided. I have shared a list of commands that you can follow to see it breaking.

As I agreed in my post that it could cause some issues, I repeat, it's not guaranteed that it would cause problems. Just because there are different versions of a crate as a transitive dependency, does not mean you are automatically going to have issues. And again, this would be a problem if the user follows your exact steps. For example, if they did the same thing in reverse order i.e. added the old dependency first and then cargo add solana-program, it wouldn't have any issues. And again, even in the case of a problem, the fix would just be to include solana-program = "*" or even run cargo update to make the versions match.

You are right that cargo update would not upgrade across major version by default, but new setups will be 100% broken. And if you think about the path we would have to follow to fix it - it would be a very hard one. But again, we do not want to break anybody.

Where do you get the new setups will be 100% broken? Since you've shown interest in estimating the percentage of people this change would affect, this 100% breaking rate would assume the only way people to new installations is the exact steps you've provided(in order), which is provably incorrect. Instead, most people initialize new projects by running anchor init and they don't ever have to add solana-program to their Cargo.toml since anchor-lang exports solana-program. I'll get back to this comment after giving examples on what actually happened.

Upgrade to 2.0 is guaranteed to be broken, by design of SemVer and cargo/rustc treatment of the compatibility rules.

You are right, upgrade to 2.0 is guaranteed to be broken because this is actually what SemVer says. Since you've bumped a major version, it's by definition a breaking release and cargo will not pull the new version for existing libraries. What happened instead was the release had a breaking change, but because it wasn't properly versioned, it leads cargo to pull the latest version and try to compile and inevitably fail due to the breaking change(surprise).

Upgrade to 1.16 was supposed to be mostly backward compatible. We underestimated the breakage and are working to fix it.

Though this conversation has only been about 1.16, this is because it is the latest issue that broke almost every library. The dependency issues have been ongoing for a long time on pretty much every minor release.

As a side note, I do not think that asking people to provide explicit version numbers is a very good approach. It would be great if we can avoid that.

There is no other option as long as you Solana keeps telling cargo that this version is compatible with the previous version and it should update everyone's dependencies but it's not actually compatible, we will have to keep telling people to downgrade this dependency and even provide lock files for them to fix their problem.

[...] from the Backwards Compatibility Policy in #32655 is a good step to avoid SemVer related confusion. For ecosystem crates, this means that we'd have to either use a max version or ~ when specifying Solana crate versions to let cargo know that it should never pull SemVer compatible versions, which brings us to exactly where we've started.

Not sure if I follow. Could you unpack this? Maybe explain a specific scenario you have in mind step by step?

The idea is that everyone will use default version constraints, by default. Why are you saying "it should never pull SemVer compatible versions"?

We do want crates that depend on the SDK to be able to use newer versions of the SDK. If crates will start specifying explicit versions, cargo will treat them as incompatible, and tings will break right there. cargo makes the decision on the compatibility of a dependency not based on the specific version number, but based on the way the dependency version is specified. In other words, if two libraries would specify solana-sdk = "=1.16.6" and solana-sdk = "=1.16.7" respectively, cargo will include two copies of the solana-sdk and rustc will treat them as incompatible.

I also do not understand what do you mean by "which brings us to exactly where we've started".

Specifying a maximum version for Solana has been a common solution for Solana's inability to follow SemVer for many of the ecosystem crates in the past, and from the looks of it, they will have to continue to do so.

Strict solana-program version requirement examples

Why do they do this? Because the moment they remove these constraints, the next minor Solana release(sometimes even a patch release) breaks their library.

What happened on the latest minor Solana release

anchor-lang

https://github.com/coral-xyz/anchor/blob/3b45144787a0493caa8fc07de35e8cfd5fe98543/lang/Cargo.toml#L40

[dependencies]
anchor-lang = "0.27.0"
# ❌ Error because did not use max version for `solana-program`
mpl-token-metadata

https://github.com/metaplex-foundation/metaplex-program-library/blob/9fc2a13b5a66d407f0b9ab31eb7b161b78f28ef0/token-metadata/program/Cargo.toml#L20

[dependencies]
mpl-token-metadata = "1.12.0"
# ❌ Error because did not use max version for `solana-program`

Why did the most used libraries in the ecosystem break out of nowhere even when they are the only dependency? Not because Solana had a breaking change, it's because Solana did not follow SemVer.

pyth-sdk-solana

https://github.com/pyth-network/pyth-sdk-rs/blob/79d7d0a1362955bdb5eafa691c59b4c9fed30ca1/pyth-sdk-solana/Cargo.toml#L14

[dependencies]
pyth-sdk-solana = "0.7.2"
# ✅ Success because it specified max `solana-program` version of `<1.15`

The only logical thing for the ecosystem libraries to do is to go back to setting explicit max version for Solana because you never know whether the next update is going to break your library or not. And this is exactly what they've done(again).

Exactly where we've started(again)

As you can see from the examples, this problem has a long history and did not magically appear in 1.14 -> 1.16, hence the comment "which brings us to exactly where we've started".


You are right that cargo update would not upgrade across major version by default, but new setups will be 100% broken.

Going back to this comment after giving examples on what actually happened instead of hypothetical steps, new setups will NOT be 100% broken, in fact it's quite the opposite. People would have to go out their way to follow exactly the steps you've given(in order) to have a problem, meanwhile the current approach broke everything regardless of what the user does.

I can't emphasise enough that this is not 1.16 specific. I get the sense that because it's impossible for solana-program to break itself, and SPL gets updated as soon as there is a problem in CI, the people who make these decisions are simply unaware of the problems they cause to the ecosystem crates by not following SemVer.

Solution

With this approach, the problems that happened many times in the past(as seen in the examples) can be drastically improved, e.g. anchor init which is how the majority of new projects are crated won't randomly break because of a new Solana release. Otherwise, unfortunately this is doing the same thing and expecting different results.

samuelvanderwaal commented 1 year ago

Semantic Versioning should significantly improve the situation from what it is now. However, if you have a better system to suggest, that's fine too. Consistency and communication are probably what are most important. What seems to happen though, is people don't like Semantic Versioning so then they don't follow any system and that leads to the kinds of issues we're seeing today.

Totally agree with this, any system is better than no system. As a minimum, it sets the expectations and rules when breaking changes can occur. It does not need to be perfect, just consistent.

It sounds as if we currently do not have any system in place. Which is not true: docs.solana.com/developing/backwards-compatibility

Software for a MINOR version release will be compatible across all software on the same MAJOR version. You're not even following your own system linked as proven by 1.14 --> 1.16. That's the crux of the nature of the complaint. Your system is effectively SemVer but you consistently don't follow it.

I would like to disagree here. While I do not have statistics on the frequency of breakages, it is confusing to say that a breakage somehow means that we are not following the compatibility rules. Should we not be following these rules, why would we be fixing the 1.16 breakage? We are fixing it exactly because we are trying to make 1.14 and 1.16 compatible.

It does not mean that we can not improve our testing process. To reduce the chance of a breakage.

I know that Solana is working hard to remedy the v1.16 breakage and I really appreciate the efforts so I definitely want to acknowledge that. I also see that Solana is updating the Backwards Compatibility Policy to more accurately reflect the way releases are actually made, which is good, even though I would prefer proper SemVer as I think it reduces the need for extra communications to developers.

I feel like our conversation here may be starting to move into territory where it's unproductive but I would like to explain that I didn't open this issue only because of v1.16. Solana breaking things has been a persistent issue in my time in the ecosystem and judging from community reactions I'm not the only one who feels that way. I haven't posted specific examples in this thread because 1) it felt unnecessary and 2) I don't want specific developers to think I'm targeting their code when it's more of a systemic issue. Finally, it's worth noting that I have been responsible for some breaking changes in minor versions in my tenure at Metaplex so I have definitely been part of the problem. This is part of why I'm making efforts to be part of the solution as well.

However, if you don't think there have been a significant number of breakages I recommend you do some or all of the following:

1) Have your dev rel team gather examples from the community 2) Look at the number of yanked versions of Solana packages on crates.io 3) Do keyword searches on Stack Exchange and Twitter 4) Look through change logs to see what breaking changes have been made in what versions and see if it follows your policy as it was formerly written

I think that will give you a pretty fair understanding of the depth of frustration the community has experienced on this issue. Again, I have been part of the wider ecosystem problem. This is not meant to be accusatory in nature, just an explanation for where the frustration comes from and that this isn't solely about v1.16.

mschneider commented 1 year ago

I think, in practice, Solana SDK will have to stay at version 1 forever. And we are going to make breaking changes in minor releases.

Would be great to understand why you need to keep the major version fixed "forever". Can you please explain?

I'm sure you have a good reason, but it also means that 3rd party developers will treat your minor version as majors in that case and use <= or ~ to pin them, then other developers will need to fork their packages to update dependencies. This is the problem we have been trying to solve for the last months. And everyone agrees, it should be solved. Let's get to the bottom of it.

danenbm commented 1 year ago

I think, in practice, Solana SDK will have to stay at version 1 forever.

The main substantive issue I understand with releasing a 2.x is having both 1.x and 2.x in the dependency graph and thus different types with with the same name. This issue is described in the cargo reference here, and was discussed in the now-closed docs PR with an example here.

Could @ilya-bobyr , @joncinque , or someone else explain why the Semver trick cannot be used to re-export types in the event of a 2.x release? My initial assumption is that there would be too many types to deal with.

joncinque commented 1 year ago

explain why the Semver trick cannot be used to re-export types in the event of a 2.x release? My initial assumption is that there would be too many types to deal with.

We'd have to do that for sure, but before we can do that, we need people to trust our minor versions and allow all of them in their toml files. The semver trick helps, but it isn't a perfect solution, since you cause immediate breakage between 1.X and 1.X+1. You're counting on everyone using the default cargo dependency declaration so that they can pick up the newer minor version right away.

More likely, we'll develop a whole new SDK for program-runtime v2 and never need to break anything again.

mschneider commented 1 year ago

The main substantive issue I understand with releasing a 2.x is having both 1.x and 2.x in the dependency graph and thus different types with with the same name. This issue is described in the cargo reference here, and was discussed in the now-closed docs PR with an example here.

If you build a bot that communicates with dapps that were built using different anchor versions you end up with duplicate types for solana-sdk. Status quo, this issue is already prevalent (outside of trivial examples of course). Everyone building backend services for defi has seen it.

Not wanting to use major upgrades to avoid versioned symbols on a major upgrade, leads to versioned symbols on a minor upgrade, I hope that makes somewhat intuitively sense. Given that everyone is experiencing the issue you are trying to protect from, I think it's adequate to reconsider that protection policy.

Any other reasons to keep the major version fixed "forever"?

gitmalong commented 1 year ago

Are there any Cargo workarounds? I am getting frustrated as it becomes almost impossible for me to update apps that rely on many third party Solana programs / libs.

samuelvanderwaal commented 1 year ago

Are there any Cargo workarounds? I am getting frustrated as it becomes almost impossible for me to update apps that rely on many third party Solana programs / libs.

You can do things like upgrade specific versions of packages within the lock file with: cargo update -p <package> --precise <version>, that might get you around the issue depending on what specific one you're encountering.

wh173-c47 commented 8 months ago

Just coming here, feeling forced to add a +1

Like who does this ?! Almost breaking changes on all minors, and not even checked fixes in detail (we agree this would be the funniest or the saddest depending on how you take it)...

Pretty much the worse dev ecosystem so far... Would love to come with a magic solution, but pretty much all examples are messed up / broken, nothing gets updated, funniest one is "Solana Cookbook", not sure you can cook anything with that.. So at some point are you willing to open the gates to adoption and attract new talents or not...

mikemaccana commented 7 months ago

Mike from Solana Foundation here - our training material currently must include content with a manual workaround to this issue (see slide 95) when we show new Solana developers onchain development. This isn't the only problem that exists out of the box, however post the next release of Anchor (thanks to the great work in 0.30 and some future PRs we've made to 0.31), it will be the only one remaining. My understanding is that Semver would ensure cargo installs a compatible version of the solana-program crate and ensure Anchor - the introduction to Solana onchain development for most developers - works out of the box. So yes please semver!