hyperledger / firefly

Hyperledger FireFly is the first open source Supernode: a complete stack for enterprises to build and scale secure Web3 applications. The FireFly API for digital assets, data flows, and blockchain transactions makes it radically faster to build production-ready apps on popular chains and protocols.
https://hyperledger.github.io/firefly
Apache License 2.0
508 stars 209 forks source link

Hardening for 1.3 Release #1470

Closed SamMayWork closed 7 months ago

SamMayWork commented 9 months ago

Had an awesome chat with @nguyer and come up with what feels like the remaining items required for hardening so that we have confidence in FireFly 1.3.

Pre-requisite: We have an RC for 1.3, realistically it looks like we might a couple of weeks away from that due to a number of in-flight PRs requiring review from specific people.

High-level items:


Migration 1.2.2 -> 1.3 RC

One of the largest items in this release is the change from having a single event stream per plugin (even across multiple namespaces) to having a single event stream per namespace in the network. There has been a whole bunch of testing from branches to verify that the migration should work as expected but there's gaps in the coverage and it's worth re-testing anyway.

In discussion w/ @nguyer we think the areas to be covered here are:

Existing testing has covered:

Script for contract migration
Performing contract migration should be able to be done using this script (note: this only works on Stacks created using the FireFly `ff` CLI.) ```bash #!/bin/bash REQUIRED_TOOLS=("jq" "yq") for TOOL in "${REQUIRED_TOOLS[@]}" do if ! [ -x "$(command -v ${TOOL})" ]; then echo "Error: ${TOOL} is not installed." >&2 exit 1 fi done STACK_NAME="${1}" CONFIG_FILE_LOCATION="${2}" FIREFLY_REPO_LOCATION="${3}" if [ -z "$STACK_NAME" ]; then echo "Error: Name of the stack to migrate was not provided." >&2 # TODO: Best-effort try to find out if the name of the stack exists exit 1 fi if [ -z "$CONFIG_FILE_LOCATION" ]; then echo "Error: Folder containing FireFly configuration files was not provided." >&2 # TODO: Test if the configuration files exist exit 1 fi if [ -z "$FIREFLY_REPO_LOCATION" ]; then echo "Error: FireFly core repository location was not provided." >&2 # TODO: Test if the configuration files exist exit 1 fi echo "" echo "-----------------------------------------------" echo "----- FireFly contract migration starting -----" echo "-----------------------------------------------" echo "" echo "Target stack name: ${STACK_NAME}" echo "Configuration hosted in: ${CONFIG_FILE_LOCATION}" echo "FireFly core repository location: ${FIREFLY_REPO_LOCATION}" echo "" getCurrentContractAddress () { ADDRESS=$(curl --silent -X 'GET' \ 'http://127.0.0.1:5000/api/v1/status' \ -H 'accept: application/json' \ -H 'Request-Timeout: 2m0s' | jq -r '.multiparty.contract.active.location.address') echo "${ADDRESS}" } TMP_DIR=/tmp/firefly-contract-upgrade mkdir -p TMP_DIR echo -ne "🖊️ Compiling the multi-party contract...\r" solc --overwrite --evm-version paris --bin ${FIREFLY_REPO_LOCATION}/smart_contracts/ethereum/solidity_firefly/contracts/Firefly.sol -o $TMP_DIR >/dev/null 2>&1 solc --overwrite --evm-version paris --abi ${FIREFLY_REPO_LOCATION}/smart_contracts/ethereum/solidity_firefly/contracts/Firefly.sol -o $TMP_DIR >/dev/null 2>&1 echo -e "✅ Compiling the multi-party contract (Done)\r" CURRENT_CONTRACT_ADDRESS=$(getCurrentContractAddress) CONTRACT_BIN=$(cat $TMP_DIR/Firefly.bin) CONTRACT_ABI=$(cat $TMP_DIR/Firefly.abi) PAYLOAD=$(jq -n \ --arg bin "${CONTRACT_BIN}" \ --argjson abi "${CONTRACT_ABI}" \ '{contract: $bin, definition: $abi, input: []}') echo -ne "🚚 Deploying the contract...\r" curl --silent -X POST -H "content-type: application/json" -d "${PAYLOAD}" \ http://localhost:5000/api/v1/namespaces/default/contracts/deploy?confirm=true > $TMP_DIR/deploy-operation.json echo -e "✅ Deployed the new contract\r" echo -ne "🔍 Extracting transaction and address information...\r" TRANSACTION_ID=$(cat $TMP_DIR/deploy-operation.json | jq -r '.output.headers.requestId') CONTRACT_ADDRESS=$(cat $TMP_DIR/deploy-operation.json | jq -r '.output.contractLocation.address') echo -e "✅ Transaction ID is ${TRANSACTION_ID} and address is ${CONTRACT_ADDRESS}\r" echo -ne "🔍 Getting the transaction receipt...\r" curl --silent -X 'GET' "http://localhost:5102/transactions/${TRANSACTION_ID}" \ -H 'accept: application/json' \ -H 'Request-Timeout: 0s' > $TMP_DIR/transaction-receipt.json echo -e "✅ Got the transaction receipt\r" echo -ne "#️⃣ Getting the block number...\r" BLOCK_NUMBER=$(cat $TMP_DIR/transaction-receipt.json | jq -r '.receipt.blockNumber') echo -e "✅ Contract deployed in block ${BLOCK_NUMBER}\r" echo -ne "⬆️ Updating stack $STACK_NAME with the new contract...\r" SAVEIFS=$IFS IFS=$'\n' files=$(find ~/.firefly/stacks/$STACK_NAME/runtime/config | grep -E 'firefly_core_[0-9].yml') files=($files) for file in "${files[@]}" do CURRENT_NAMESPACE=$(cat $file | yq '.namespaces.default') # jq > yq FILE_COPY="${file}.json" cat ${file} | yq --output-format json > "${FILE_COPY}" NEW_CONTRACT_ENTRY=$(jq -n \ --arg blocknumber "${BLOCK_NUMBER}" \ --arg contractaddress "${CONTRACT_ADDRESS}" \ '{firstEvent: $blocknumber, location: { address: $contractaddress }, options: {}}') EDITED_FILE=$(jq \ --arg NAMESPACE "${CURRENT_NAMESPACE}" \ --argjson NEW_CONTRACT_ENTRY "${NEW_CONTRACT_ENTRY}" \ '(.namespaces.predefined[] | select(.name == $NAMESPACE) | .multiparty.contract) |= .+[$NEW_CONTRACT_ENTRY]' "${FILE_COPY}") echo "$EDITED_FILE" | yq -P > ${file} echo "✅ Updated ${file}" rm "${FILE_COPY}" done IFS=$SAVEIFS echo -e "✅ Stack ${STACK_NAME} updated!\r" echo -ne "🔫 Restarting FireFly docker containers...\r" SAVEIFS=$IFS IFS=$'\n' containers=$(docker container ls | grep -E 'firefly_core') containers=($containers) for container in "${containers[@]}" do ID=$(echo $container | awk '{print $1}') docker restart $ID >/dev/null 2>&1 done IFS=$SAVEIFS echo -ne "✅ Containers restarted\r" echo -ne "⏱️ Waiting for the new containers to come up\r" sleep 10 echo -e "✅ They're probably up by now...\r" echo -ne "🖊️ Terminating use of current contract...\r" curl --silent -X 'POST' \ 'http://127.0.0.1:5000/api/v1/network/action' \ -H 'accept: application/json' \ -H 'Request-Timeout: 2m0s' \ -H 'Content-Type: application/json' \ -d '{ "type": "terminate" }' >/dev/null 2>&1 echo -e "✅ Terminated use of current contract\r" echo -ne "⏱️ Waiting before verifying the new contract is in use\r" sleep 10 echo -e "✅ Got the current contract\r" DISCOVERED_CONTRACT_ADDRESS=$(getCurrentContractAddress) echo "" echo "-----------------------------------------------" echo "----- FireFly contract migration is done! -----" echo "-----------------------------------------------" echo "" echo "Old contract location: ${CURRENT_CONTRACT_ADDRESS}" echo "New contract location: ${DISCOVERED_CONTRACT_ADDRESS}" ```

Major functionality check

There's a draft PR here https://github.com/hyperledger/firefly/pull/1461 with some draft release notes covering the major features being added in this release, as part of hardening here we should go through with the release candidate and check that all of the major function area have been covered by some testing.


Performance testing

Once we have a new RC available we can start to conduct performance regression testing against the previous release. Some very preliminary testing has been done under this issue https://github.com/hyperledger/firefly/issues/1465 so we should be able to use the same configuration for the testing there.

SamMayWork commented 9 months ago

RC1 - Migration Testing


Some pre-RC testing has already been done in this space, so the aim of this testing is to double check the existing testing and other permutations of FireFly configurations which have not yet been checked. From the original comment we know we don't have coverage for migration testing in these areas:

Additionally, we'll also need to do migration testing in the areas we have covered pre-RC:

Using this issue to track permutations of testing...

General steps to migrate a FireFly stack
- A freshly built CLI from source (to ensure you have the latest commits) - Create and run a normal stack - Run E2E tests (using tests from the commit your stack is based from) - Build new images using commits from the release you want to move to - Update the Docker Compose Override file with the new images - Upgrade the batch pin contract (see below) - Restart the containers - Run the E2E tests (using tests from the commit you've moved to) - Verify that all data from the previous run of the E2E is still available To upgrade the batch pin contract: - Deploy the new contract - Update the config file for each node with the new address and block number - Restart the nodes - `POST` to `/network/actions` with payload `{"type": "terminate"}` - Verify with a `GET /status` call that the new contract is in use ...or alternatively use the [script from the original comment](https://github.com/hyperledger/firefly/issues/1470#issue-2147099073) to automate this process.
Very quick and dirty migration script
Will contribute this formally into a script somewhere when it's not thrown-together hackiness but this script does a semi-automatic migration. ```sh #!/bin/bash export STACK_NAME=migration export CREATE_STACK=false ff init migration ff start migration sleep 10 cd ./firefly make e2e cd .. ff stop migration cat >>~/.firefly/stacks/migration <
Multi-party DB Provider Blockchain Connector Tokens Connector Passed? Tested by
N PostgreSQL EVMConnect None @nguyer
Y PostgreSQL EVMConnect None @nguyer
N PostgreSQL EVMConnect ERC20/721 @nguyer
Y PostgreSQL EVMConnect ERC20/721 @nguyer
Y PostgreSQL EVMConnect ERC1155 @nguyer
N PostgreSQL EVMConnect ERC1155 @nguyer
N PostgreSQL Fabconnect None @nguyer
Y PostgreSQL Fabconnect None @nguyer
SamMayWork commented 9 months ago

RC1 - Functionality Check


SamMayWork commented 9 months ago

RC1 - Performance Testing


Will start looking at the performance of RC1 soon, but in the mean time, I've kicked a test of 1.2.2 to gather some performance metrics as a reference point, I'll put the configuration and results below.

1.2.2 Release Commit 1.3-rc1 Release Commit

Reference performance testing for 1.2.2
``` nohup ./start.sh &> ffperf.log & ``` core-config.yml ``` log: level: debug broadcast: batch: size: 200 timeout: 1s privatemessaging: batch: size: 200 timeout: 1s message: writer: count: 5 download: worker: count: 100 publicstorage: ipfs: api: requestTimeout: 2s gateway: requestTimeout: 2s ``` ethconnect.yml ``` rest: rest-gateway: maxTXWaitTime: 120 maxInFlight: 200 alwaysManageNonce: true attemptGapFill: true sendConcurrency: 3 gasEstimationFactor: 2.0 confirmations: required: 5 debug: port: 6000 ``` instances.yml ``` stackJSONPath: /home/ubuntu/.firefly/stacks/1-2-2-perf-test/stack.json wsConfig: wsPath: /ws readBufferSize: 16000 writeBufferSize: 16000 initialDelay: 250ms maximumDelay: 30s initialConnectAttempts: 5 heartbeatInterval: 5s instances: - name: long-run tests: [{"name": "msg_broadcast", "workers":50},{"name": "msg_private", "workers":50},{"name": "blob_broadcast", "workers":30},{"name": "blob_private", "workers":30},{"name": "custom_ethereum_contract", "workers":20},{"name": "token_mint", "workers":10}] length: 500h sender: 0 recipient: 1 messageOptions: longMessage: false tokenOptions: tokenType: fungible contractOptions: {"address": "0xfe1a8867fc460fe5696cb316b2649788b74ec46d"} ``` FireFly git commit: ``` d0fb82d64cfeb2848b0a32a6bc286d5b9ade87ea ```
nguyer commented 8 months ago

Steps that I am following for validating migration scenarios:

Multiparty Tests

Tokens Tests

EnriqueL8 commented 8 months ago

RC4 - Performance Testing

Have started a RC4 performance testing with the below options

Reference Performance test options ``` nohup ./start.sh &> ffperf.log & ``` core-config.yml ``` log: level: debug broadcast: batch: size: 200 timeout: 1s privatemessaging: batch: size: 200 timeout: 1s message: writer: count: 5 download: worker: count: 100 publicstorage: ipfs: api: requestTimeout: 2s gateway: requestTimeout: 2s ``` ethconnect.yml ``` rest: rest-gateway: maxTXWaitTime: 120 maxInFlight: 200 alwaysManageNonce: true attemptGapFill: true sendConcurrency: 3 gasEstimationFactor: 2.0 confirmations: required: 5 debug: port: 6000 ``` instances.yml ``` stackJSONPath: /home/ubuntu/.firefly/stacks/enrique-test/stack.json wsConfig: wsPath: /ws readBufferSize: 16000 writeBufferSize: 16000 initialDelay: 250ms maximumDelay: 30s initialConnectAttempts: 5 heartbeatInterval: 5s instances: - name: long-run tests: [{"name": "msg_broadcast", "workers":50},{"name": "msg_private", "workers":50},{"name": "blob_broadcast", "workers":30},{"name": "blob_private", "workers":30},{"name": "custom_ethereum_contract", "workers":20},{"name": "token_mint", "workers":10}] length: 500h sender: 0 recipient: 1 messageOptions: longMessage: false tokenOptions: tokenType: fungible contractOptions: {"address": "0xf4e5e921cf78de3c503623bb91230c4e54cf91cb"} ``` FireFly git commit: ``` 577e8c47680c6230209a74829921a9c427766af8 ```
EnriqueL8 commented 7 months ago

Run Report RC4

Started: 10/04/24 Duration: ~5 hours Git commit: https://github.com/hyperledger/firefly/commit/[49410c52653e143e8a17bd9ab58ba2423f564714]

Node Configuration 2 FireFly nodes on one virtual server (EC2 m4.xlarge) Entire FireFly stack is local to the server (ie both blockchains, Postgres databases, etc) Single geth node with 2 instances of ethconnect Maximum time to confirm before considering failure = 1 minute

Reference Performance test options core-config.yml ``` log: level: debug broadcast: batch: size: 200 timeout: 1s privatemessaging: batch: size: 200 timeout: 1s message: writer: count: 5 download: worker: count: 100 publicstorage: ipfs: api: requestTimeout: 2s gateway: requestTimeout: 2s ``` ethconnect.yml ``` rest: rest-gateway: maxTXWaitTime: 120 maxInFlight: 200 alwaysManageNonce: true attemptGapFill: true sendConcurrency: 3 gasEstimationFactor: 2.0 debug: port: 6000 ``` instances.yml ``` stackJSONPath: /home/ubuntu/.firefly/stacks/latest/stack.json wsConfig: wsPath: /ws readBufferSize: 16000 writeBufferSize: 16000 initialDelay: 250ms maximumDelay: 30s initialConnectAttempts: 5 heartbeatInterval: 5s instances: - name: long-run tests: [{"name": "msg_broadcast", "workers":50},{"name": "msg_private", "workers":50},{"name": "blob_broadcast", "workers":30},{"name": "blob_private", "workers":30},{"name": "custom_ethereum_contract", "workers":20},{"name": "token_mint", "workers":10}] length: 500h sender: 0 recipient: 1 messageOptions: longMessage: false tokenOptions: tokenType: fungible contractOptions: {"address": "0x528adc5c826721ba6a40342ad5918a3499f9663c"} ``` FireFly git commit: ``` 49410c52653e143e8a17bd9ab58ba2423f564714 ```

NOTE: confirmations set to 0

Results

Broadcast messages: 199,448 Private messages: 235,242 Token mints: 18,443 Transactions: 89,198 No errors

Summary result:

INFO[2024-04-09T16:32:43.924] Shutdown summary:
INFO[2024-04-09T16:32:43.924]  - Prometheus metric sent_mints_total        = 18447.000000
INFO[2024-04-09T16:32:43.924]  - Prometheus metric sent_mint_errors_total  = 0.000000
INFO[2024-04-09T16:32:43.924]  - Prometheus metric mint_token_balance      = 0.000000
INFO[2024-04-09T16:32:43.924]  - Prometheus metric received_events_total   = 1097482.000000
INFO[2024-04-09T16:32:43.924]  - Prometheus metric incomplete_events_total = 0.000000
INFO[2024-04-09T16:32:43.924]  - Prometheus metric delinquent_msgs_total    = 0.000000
INFO[2024-04-09T16:32:43.924]  - Prometheus metric actions_submitted_total = 530299.000000
INFO[2024-04-09T16:32:43.924]  - Test duration: 4h58m57.296804144s
INFO[2024-04-09T16:32:43.924]  - Measured actions: 1097105
INFO[2024-04-09T16:32:43.924]  - Measured send TPS: 61.167763
INFO[2024-04-09T16:32:43.924]  - Measured throughput: 61.163341
INFO[2024-04-09T16:32:43.924]  - Measured send duration: min: 11.556354ms, max: 1.485517803s, avg: 151ms
INFO[2024-04-09T16:32:43.924]  - Measured event receiving duration: min: 2.030708026s, max: 1m4.571629481s, avg: 6.414s
INFO[2024-04-09T16:32:43.924]  - Measured total duration: min: 2.030708026s, max: 1m4.571629481s, avg: 6.414s

Grafana results:

Pasted Graphic 19 Pasted Graphic 20 image

(Note I modified the Grafana dashboard to add the transfer submitted to the broadcast submitted ahead of https://github.com/hyperledger/firefly/pull/1490 getting merged)

I find the heatmap not particularly useful, so this a view with histograms to see on average how long it takes to confirm:

image

Compared to the testing from 1.2, the number look slightly better based on the above testing.

I did notice that the TPS and time to confirm grows overtime

EnriqueL8 commented 7 months ago

Run Report RC2

Started: 24/05/24 Duration: ~27 hours Git commit: https://github.com/hyperledger/firefly/commit/[b2f86880a109d17751c3481ea1a72c9a2e94dd28]

Node Configuration 2 FireFly nodes on one virtual server (EC2 m4.xlarge) Entire FireFly stack is local to the server (ie both blockchains, Postgres databases, etc) Single geth node with 2 instances of ethconnect Maximum time to confirm before considering failure = 1 minute

Reference Performance test options core-config.yml ``` log: level: debug broadcast: batch: size: 200 timeout: 1s privatemessaging: batch: size: 200 timeout: 1s message: writer: count: 5 download: worker: count: 100 publicstorage: ipfs: api: requestTimeout: 2s gateway: requestTimeout: 2s ``` ethconnect.yml ``` rest: rest-gateway: maxTXWaitTime: 120 maxInFlight: 200 alwaysManageNonce: true attemptGapFill: true sendConcurrency: 3 gasEstimationFactor: 2.0 debug: port: 6000 ``` instances.yml ``` stackJSONPath: /home/ubuntu/.firefly/stacks/rc2-latest/stack.json wsConfig: wsPath: /ws readBufferSize: 16000 writeBufferSize: 16000 initialDelay: 250ms maximumDelay: 30s initialConnectAttempts: 5 heartbeatInterval: 5s instances: - name: long-run tests: [{"name": "msg_broadcast", "workers":50},{"name": "msg_private", "workers":50},{"name": "blob_broadcast", "workers":30},{"name": "blob_private", "workers":30},{"name": "custom_ethereum_contract", "workers":20},{"name": "token_mint", "workers":10}] length: 500h sender: 0 recipient: 1 messageOptions: longMessage: false tokenOptions: tokenType: fungible contractOptions: {"address": "0x4c05f4e749304da29017e4eb3e5e2a4aaa84e637"} subscriptionOptions: batch: true batchTimeout: 250ms readAhead: 50 ``` FireFly git commit: ``` b2f86880a109d17751c3481ea1a72c9a2e94dd28 ```

NOTE: confirmations set to 0

Results

Broadcast messages: 1,000,039 Private messages: 1,424,155 Token mints: 96,353 Transactions: 540K No errors

Summary result:

INFO[2024-04-25T13:57:39.019] Shutdown summary:
INFO[2024-04-25T13:57:39.020]  - Prometheus metric sent_mints_total        = 96373.000000
INFO[2024-04-25T13:57:39.020]  - Prometheus metric sent_mint_errors_total  = 0.000000
INFO[2024-04-25T13:57:39.020]  - Prometheus metric mint_token_balance      = 0.000000
INFO[2024-04-25T13:57:39.020]  - Prometheus metric received_events_total   = 6008344.000000
INFO[2024-04-25T13:57:39.020]  - Prometheus metric incomplete_events_total = 0.000000
INFO[2024-04-25T13:57:39.020]  - Prometheus metric delinquent_msgs_total    = 0.000000
INFO[2024-04-25T13:57:39.020]  - Prometheus metric actions_submitted_total = 2907814.000000
INFO[2024-04-25T13:57:39.020]  - Test duration: 27h32m18.966742004s
INFO[2024-04-25T13:57:39.020]  - Measured actions: 6008025
INFO[2024-04-25T13:57:39.020]  - Measured send TPS: 60.604479
INFO[2024-04-25T13:57:39.020]  - Measured throughput: 60.602054
INFO[2024-04-25T13:57:39.020]  - Measured send duration: min: 8.687266ms, max: 6.013594408s, avg: 174ms
INFO[2024-04-25T13:57:39.021]  - Measured event receiving duration: min: 2.007866751s, max: 54.132299131s, avg: 6.473s
INFO[2024-04-25T13:57:39.021]  - Measured total duration: min: 2.007866751s, max: 54.132299131s, avg: 6.473s

Grafana results:

image image image

I find the heatmap not particularly useful, so this a view with histograms to see on average how long it takes to confirm:

image image
nguyer commented 7 months ago

This is great! Thank you so much for all the work on this, @EnriqueL8