Closed frol closed 4 years ago
I agree that using master
for stable releases is awkward. Personally, when I visit any given open source github repo, I expect master to be the bleeding edge of development. One of the first things I look for is recent activity. I want to know if the project is active or not. If I see that there hasn't been a commit in a few months, I immediately become wary that the project is not actively supported or developed.
I support the idea of using master
as the place for active development and creating separate branches for our releases as nightly, beta, and stable. Production releases should also likely have specific branches and tags which allow anyone to check out a specific product release version (same for nightly and beta, too).
Obviously, we should strive for stable releases to not include regressions or backwards incompatibilities. Master should attempt to adhere to this policy as much as possible. Using master
for active development will place the right kind of pressure on contributors to not break things simply because master will likely be the first thing anyone checks out.
I'd like to see the conversation here expand to include the naming convention we'd like to use for these branches and releases.
Bo, Bowen, and I have been working through our release process which, today, is essentially rebuilding and lightly testing staging then merging it into master. In our live/offline discussions, happening in person, we are proposing the following:
For the next release (i.e., next week), we propose switching tip development (i.e., the default branch of nearcore) to be master
and actually use master
in that way.
Today, the default branch in github is indeed master
, but any developer wanting to do active work that is not ready for testnet/beta/production will switch to the staging
branch after doing a checkout. Once they are ready to check in/merge, their changes are checked into staging
where we ideally would be testing everything, and then every once in a long while we merge those changes into master
. This process is what will change.
Before this can happen, we will need to wait until master
is stable using our latest code from staging
. Once this happens, we will then create beta
and stable
branches. We will populate those branches with our master
branch, therefore starting fresh.
There are several steps to make this real. Here's just a few off the top of my head:
nearprotocol/nearcore
to use the stable
branch which will always work with a default network that has been tested.devnet
for development efforts. This will have a lower stability requirement than our current testnet
. This will be the default network for the master
branch.staging
to beta
). We should add a new network called 'betanet'. stable
branch network will eventually be mainnet
once we reach that level of stability; however, today, until mainnet
is ready for launch, since our users are used to pointing at testnet
, we propose we keep using testnet
until mainnet
is ready for production. We shouldn't claim stability for a network that isn't stable yet. ~/.near
directory. This should likely discussed in another issue, but it should happen quickly to make sure the proper networks are used out of the box after checkout.As I understand there is some applayer part of transition:
render.yml
)ci-
networks for running integration tests@ailisp wonder if you want to look into these? Might be good idea to have someone other than me do it to increase the bus factor.
So here are few questions + thoughts to this and #2174:
master
to join TestNet / MainNet?
master
is the most recent stable / MainNet release. And we only update it when we hard fork our network (current approach). Merging to master is major release. This should be tested ~14 days of replaying transactions, bombarding, security reviews, etc.master
has latest code, but we maintain backward compatibility. Meaning we have CI that spins up new node, connects to the stable TestNet (that replicates MainNet), replays the chain and runs suite of tests. Without passing this, we can not merge into master. This should probably take ~1 day at least of testing.master
with less strict rules on merging. We can ban running start_mainnet.py
and start_testnet.py
in the master
branch (they bork and say "You are on a wrong branch, switch to blah
branch"). Note, that if we don't do that, the risk is that someone will run slightly different protocol that will get them slashed on MainNet. Moving from master to beta can take ~1 day of testing toward beta network and ~14 days of testing on beta network replaying production and stress testing loads to stable.Previously we were targeting to go with (1) I think @damons you suggesting to go with (3) -- I don't mind just want to make sure we understand all things coming with it.
E.g. we need to make sure then that even if someone tries to run binary from master
toward stable / and later MainNet - it should always fail with helpful error.
For example state without trie #2050 - even with somewhat scoped change, it still changes some APIs across other usages. Or even something simple as 170 lines biasable randomness PR #2177 is might be possible but would require flags on the data structure to ignore new field (because otherwise that's a hard fork of a feature) and 16 files of flags with prob multiple places in each file. I'm not even talking about the unbiasable randomness PR that is 1.5k+ lines across probably all chain components.
We have discussed upgradability as well, and @nearmax was suggesting to use different versions of the same cargo package in the same binary and change which one is used via runtime. I'm not 100% sure how that would work, but we can explore that as an alternative to feature flags and also allow us to upgrade network without downtime.
@damons In your latest table I'm not 100% sure if master
is latest pushed code or latest "beta" and dev
is latest pushed code?
Not exactly sure from your comment also the flow from branch to branch, I was expecting something like this:
master
- latest pushed code, can contain hard fork from current staging
and stable
, NEVER runs with TestNet and MainNet networks (refuses to do so to prevent accidental slashing and network splits). Runs suite of tests in background after PR submitted, and if there are regressions - we rollback that PR / PRs. Suite of tests, include spinning up large network of nodes, replaying production load of transactions and also stress testing with set of contracts & transactions.master
to staging
network, which is our TestNet. This is binary that runs with TestNet. This has people testing it continuously: validators, developers, us, both because this is "free" network and all of our tooling supports it well. Additional testing / security reviews happen at this point as well. Tooling updates also "stage" here and test that everything works.stable
, which is MainNet release. This includes updates recorded in the spec release 1 month prior and any other relevant bug fixes and changes.stable
, but require additional process of approving & defining how that will be tested.
- (3) If no, we can have bleeding edge in master with less strict rules on merging. We can ban running start_mainnet.py and start_testnet.py in the master branch (they bork and say "You are on a wrong branch, switch to blah branch").
To prevent accidents we can introduce protocol_version
field in the genesis file (and if some feature changes protocol, protocol_version
must be incremented). On a network level we have a check already witch prevents nodes with a different genesis to connect to each other.
@nearmax how hard would it be to make macro magic that has annotations like this:
struct Version(u64); // could be something more sophisticated, as long as it can be compared
#[derive(BorshSerialize, BorshDeserialize)]
struct Approval {
pub parent_hash: CryptoHash,
pub reference_hash: Option<CryptoHash>,
#[borsh_version_ge(8)] // <-- *something like this*
pub some_new_field: CryptoHash,
}
And then have serialize
/ deserialize
take a version as an argument?
As I understand there is some applayer part of transition:
* Deploy new networks including corresponding web stuff (this might be good time to move all config to `render.yml`) * Have corresponding `ci-` networks for running integration tests * Update default configs in create-near-app and near-shell
@ailisp wonder if you want to look into these? Might be good idea to have someone other than me do it to increase the bus factor.
Sure, after switch to master for development I can help setup these
@ilblackdragon
Runs suite of tests in background after PR submitted, and if there are regressions - we rollback that PR / PRs
Current issue: we have many failed nightly tests, I think need them all green for a few days to consider both code and tests stable enough, otherwise too many rollback Potential issue: rollback more than one PR possibly block several team members, I'd argue it's better to wait background test of previous merge success, before merge another PR
- (3) If no, we can have bleeding edge in master with less strict rules on merging. We can ban running start_mainnet.py and start_testnet.py in the master branch (they bork and say "You are on a wrong branch, switch to blah branch").
To prevent accidents we can introduce
protocol_version
field in the genesis file (and if some feature changes protocol,protocol_version
must be incremented). On a network level we have a check already witch prevents nodes with a different genesis to connect to each other.
Good point. I think we have protocol_version, but didn't increase it for a while
- (3) If no, we can have bleeding edge in
master
with less strict rules on merging. We can ban runningstart_mainnet.py
andstart_testnet.py
in themaster
branch (they bork and say "You are on a wrong branch, switch toblah
branch"). Note, that if we don't do that, the risk is that someone will run slightly different protocol that will get them slashed on MainNet. Moving from master to beta can take ~1 day of testing toward beta network and ~14 days of testing on beta network replaying production and stress testing loads to stable.Previously we were targeting to go with (1) I think @damons you suggesting to go with (3) -- I don't mind just want to make sure we understand all things coming with it.
Yes. (3) is what I'm proposing.
@damons In your latest table I'm not 100% sure if
master
is latest pushed code or latest "beta" anddev
is latest pushed code?Not exactly sure from your comment also the flow from branch to branch, I was expecting something like this:
master
- latest pushed code, can contain hard fork from currentstaging
andstable
, NEVER runs with TestNet and MainNet networks (refuses to do so to prevent accidental slashing and network splits). Runs suite of tests in background after PR submitted, and if there are regressions - we rollback that PR / PRs. Suite of tests, include spinning up large network of nodes, replaying production load of transactions and also stress testing with set of contracts & transactions.- every week we push upgrade latest fully tested
master
tostaging
network, which is our TestNet. This is binary that runs with TestNet. This has people testing it continuously: validators, developers, us, both because this is "free" network and all of our tooling supports it well. Additional testing / security reviews happen at this point as well. Tooling updates also "stage" here and test that everything works.- every month we have release into
stable
, which is MainNet release. This includes updates recorded in the spec release 1 month prior and any other relevant bug fixes and changes.- additional hot fixes and security fixes can be fast tracked into
stable
, but require additional process of approving & defining how that will be tested.
Yes, with a few slight modifications:
master
is lastest pushed code. We're creating a new network for development work: devnet
. Should be done by EOD tomorrow (Wed. Feb 26th).
beta
branch is code promoted weekly (Wednesdays) from master
. We are creating a betanet
for this branch. We will no longer use staging_testnet
. EOD tomorrow.
stable
branch is code promoted monthly from beta
branch. Stable uses testnet
until mainnet
is launched. Stable will then begin using mainnet
after launch.
Documentation and scripts are being updated so that anyone who wants to run a node will receive instructions to checkout the stable
branch first before doing a build or running a node. Today, the documentation instructs anyone wanting to run a node to checkout the staging
branch first (which often fails today, See: #2044 ). We are updating the documentation to instruct them to use stable
instead, which will point to testnet
per above. See: #2192.
master is lastest pushed code. We're creating a new network for development work: devnet. Should be done by EOD tomorrow (Wed. Feb 26th). beta branch is code promoted weekly (Wednesdays) from master. We are creating a betanet for this branch. We will no longer use staging_testnet. EOD tomorrow. stable branch is code promoted monthly from beta branch. Stable uses testnet until mainnet is launched. Stable will then begin using mainnet after launch.
Sounds good to me. We have stable&master branch now. Tomorrow will create beta branch and first release from that :)
Beta branch is already created by the way.
@ailisp @bowenwang1996 we just had discussion about this (see more details https://commonwealth.im/near/proposal/discussion/360-consistency-between-downloaded-state-github-branches-and-the-actual-networks) and there are few questions that popped up:
beta
, to state dump, to updated genesis state to everyone in the network upgraded to the new code and new participants join the same network without issues. Same for stable
.
beta
is qualified, we should run all "nightly" tests and they should pass. If there is regression, we should be fall back into latest stable "nightly" commit. Didn't find where this is spec-ed out.
@ilbackdragon these are essential questions, but I feel that they do not belong to this particular issue. Let’s have another tracking issue for the raised concerns
@ilblackdragon I thought we will run devnet and betanet internally. If we allow external people to join it is much harder to automate things. For testnet release yes there needs to be some coordination. I somehow forgot about this and will think more about it. For how releases qualify, please checkout https://commonwealth.im/near/proposal/discussion/344-release-process
Closing this issue as @frol is right. Let's keep discussion to commonwealth.
Currently, we keep
master
as our stable branch, so a user who wants to play with nearcore node, can join testnet without a need to learn about other branches. However, we have been very aggressive about evolving our protocol (i.e. adding new features and experiments, which touched the whole codebase, e.g. Nightshade, DoomSlug), and obviously, these efforts were not completed in a single day, so naturally, git conflicts would occur, so we decided to usestaging
branch, where we would merge code that worked good enough for development but had naturally been lacking backward compatibility and stress-testing. Thus, we ended up with the process of forking fromstaging
and releasing tomaster
fromstaging
once in a while.We had never intended to stick to this process forever, and as we are working on the proper release process right now (https://github.com/nearprotocol/nearcore/projects/3), we should eventually arrive into the state where we can fork from master and create separate branches for our releases (nightly / beta / stable).
NOTE: The current story around master and staging has already confused our external contributors: https://github.com/nearprotocol/nearcore/pull/2153, so we should not let this happen in the future.