jagpreetsinghsasan commented 2 years ago

Description

As a developer, I want to upgrade the existing Hyperledger Fabric 1.4.x deployment (done using Hyperledger Bevel) to Hyperledger Fabric 2.2.x deployment

Before you start reading the actual issue description :)

Its gonna be a loooooooong hands-on experience. 🙂😁👍

And this spike is must to list down the automation GH issues, which is needed once we verify that this spike works! (definitely what is written down will not be 100% accurate due to extra variables like kubernetes, ambassador and others)

I will try my best to mention each and every step in precise details here, but you can refer to the official guide anytime.

Here it goes!

The general trend of upgrading entities is like

Backup the data
Upgrade the binary
Restart the entity

So, lets divide the entire flow into these 2 segments and work on them accordingly.

Back-up of ledgers, MSPs etc etc..

THIS NEEDS MORE RESEARCH : The data needed to be backup are already present on pvc and if not, we need to migrate those data to pvc (If the migration is needed, the required shell script, which needs to be run in the pod container, shall also be created for the upcoming automation stories to work with.)

The couchDB data directory
The ledger data defaults at /var/hyperledger/production/orderer for orderer node and /var/hyperledger/production for peer node. (Note, this should be in order, as in, first the couchDB backup shall happen and then the peer ledger data backup shall happen, because if there is block height difference between these 2, peer can reconstruct the stateDB again)
The MSP directory of each orderer and peer node
For Fabric CA server, it is the fabric-ca-server.db file
A script which can find out the current chaincode versions of all installed chaincodes. along with the list of chaincode names and their current versions (this is needed during the time of chaincode upgrade)

Fabric entity upgrade (the order of upgrade shall be retained and shall be done when the above backups are ready in pvc)

PRIOR FIXES REQUIRED: The entities such as Orderer, Peer, CA server, CA client and CouchDB shall have value files with their image versions mentioned. If not, we need to do those changes to the 1.4.x charts first. The basic idea being, once the pre-requisite data is backup, there should be just a git push of newly image version and flux shall deploy the new image on the existing pvc's

Upgrading Orderer nodes

The orderers should be upgraded one at a time (in a rolling fashion)

Push the orderer file with new orderer image (hopefully the flux does the redeployment of the new orderer image but using the same pvc. If this happens, good, if not, we need to figure out another way of updating the orderer image, maybe by editing the deployment configuration or something of that sort. For the time being, assuming flux does the deployment of the newly image by pushing the orderer image changes in the value file, and in the below scenarios as well)

Upgrading peer nodes along with their couchDB

As we have both of these entities in the same pod, the upgrade shall happen something like this

Update the value file with the new couchDB image version and peer node version
Remove peer chaincode containers found using (shall be on one of the kubernetes nodes, so a script is expected to find them and remove them) CC_CONTAINERS=$(docker ps | grep dev-$PEER_CONTAINER | awk '{print $1}') if [ -n "$CC_CONTAINERS" ] ; then docker rm -f $CC_CONTAINERS ; fi
Remove peer chaincode images found using (shall be on one of the kubernetes nodes, so a script is expected to find them and remove them) CC_IMAGES=$(docker images | grep dev-$PEER | awk '{print $1}') if [ -n "$CC_IMAGES" ] ; then docker rmi -f $CC_IMAGES ; fi
Add one line to the chart before pushing the above modified value file, just before node starts, we need to upgrade the dbs to match the db format for 2.2.x as peer node upgrade-dbs and after this the node shall start using peer node start.. cmd
Push the above modified peer chart value file

Upgrading the CA server

Push the CA server value file with the new image version and hopefully flux deploys it 🙂 (the ca server db file, fabric-ca-server.db, should be present, mentioned above in the backup section)

Upgrading the CA client

Push the CA Tools value file with the new image version and hopefully flux deploys it 🙂

Channel upgrade (The most complicated part to be tested for automation) :smile:

We need to enable the new chaincode lifecycle process. In short, this will resuse almost all of the code written for updating channel config (just with new json blocks added to the exisitng channel config)

NOTE: Before upgrading the channel config, we need to make sure that the orderer endpoints are defined in both the system channel and in all application channels (this can be checked by fetching these channel blocks, decoding them and finding if the orderer endpoints exists). If orderer endpoints do not exist in them, then one should first upgrade all of these channels. The process of such an upgrade is exactly how we have upgraded the channel config in some playbooks such as add orderer playbook (https://github.com/hyperledger/bevel/blob/main/platforms/hyperledger-fabric/configuration/add-orderer-organization.yaml), add peer playbook(https://github.com/hyperledger/bevel/blob/main/platforms/hyperledger-fabric/configuration/add-peer.yaml) etc etc

Upgrade the channel capabilities : In short, it is simply as, fetch the required channel block, decode the channel block, update the channel block using the new configuration, re-encode the block, calculate the block difference, submit the upgraded transaction. (I didnt write the actual way as it is clearly written the below referenced link, and we have done it at several places in our code already). Refer this guide to know the actual modifications required in the respective system channel and application channel

Chaincode upgrade

Once the channels are upgraded, the chaincodes on them can be upgraded by running chaincode upgrade command with a newer version of the chaincode than what earlier is. NOTE: The previous chaincode version shall not be taken as 1.0 as there might be changes done by the organizations later on. So a script to find out the current chaincode version is needed, as mentioned in the BACKUP section

jagpreetsinghsasan commented 2 years ago

Two more important stuff to work on this big issue.

With each entity upgrade, the test of it being working, shall also happen, otherwise going back to check out what went wrong will be tons of work.
If few more steps are required for certain entities, or if something is found incorrect, this issue comment section can be used to point them out, so that later the correct knowledge can be used to create the automation GH issues.

sownak commented 2 years ago

Thanks @jagpreetsinghsasan for this detailed summary. I think automating the whole thing would not be necessary as this is not something an operator would do multiple times. If we can create a detailed Guide for readthedocs, that should complete this task.

hyperledger / bevel

spike(fabric): Manual HL Fabric upgrade from 1.4.x to 2.2.x deployed using HL Bevel #1845