Open gsstoykov opened 1 month ago
Also tried doing the same flow with the C++ SDK NodeCreateTransaction followed by npm run solo -- node add-execute --input-dir context
. Seems like the node pod is correctly created also setup and start are passing as well but got the following log:
npm run solo -- node add-execute --input-dir context
> @hashgraph/solo@0.31.0 solo
> NODE_OPTIONS=--experimental-vm-modules node --no-deprecation solo.mjs node add-execute --input-dir context
******************************* Solo *********************************************
Version : 0.31.0
Kubernetes Context : kind-solo-e2e
Kubernetes Cluster : kind-solo-e2e
Kubernetes Namespace : solo-e2e
**********************************************************************************
✔ Initialize [0.1s]
✔ Identify existing network nodes
✔ Check network pod: node1
✔ Load context data
✔ Download generated files from an existing node [0.4s]
✔ Prepare staging directory
✔ Copy Gossip keys to staging
✔ Copy gRPC TLS keys to staging
✔ Copy node keys to secrets
✔ Copy TLS keys
✔ Node: node1
✔ Copy Gossip keys
✔ Node: node2
✔ Copy Gossip keys
✔ Check network nodes are frozen [6s]
✔ Check network pod: node1 - status FREEZE_COMPLETE, attempt: 0/120 [6s]
✔ Get node logs and configs [8s]
✔ Deploy new network node [2s]
✔ Kill nodes to pick up updated configMaps
✔ Check node pods are running [30s]
✔ Check Node: node1
✔ Check Node: node2 [30s]
✔ Fetch platform software into all network nodes [5s]
✔ Update node: node1 [ platformVersion = v0.54.0-alpha.4 ] [5s]
✔ Update node: node2 [ platformVersion = v0.54.0-alpha.4 ] [5s]
✔ Download last state from an existing node [0.4s]
✔ Upload last saved state to new network node [0.4s]
✔ Setup new network node [0.1s]
✔ Node: node1 [0.1s]
✔ Set file permissions [0.1s]
✔ Node: node2
✔ Set file permissions
✔ Start network nodes [0.1s]
✔ Start node: node1
✔ Start node: node2
↓ Enable port forwarding for JVM debugger
❯ Check all nodes are ACTIVE
✔ Check network pod: node1 - status ACTIVE, attempt: 16/120 [24s]
✖ node 'node2' is not ACTIVE[ attempt = 120/120 ]
◼ Check all node proxies are ACTIVE
◼ Stake new node
◼ Trigger stake weight calculate
◼ Finalize
*********************************** ERROR *****************************************
Error in setting up nodes: node 'node2' is not ACTIVE[ attempt = 120/120 ]
***********************************************************************************
We discovered there is currently an issue in platform/services with NodeCreateTransaction. After the node has been added and the one of the nodes goes into teach mode for the newly added node, the teacher will get JVM out of memory errors after finishing teaching and reconnecting to the network. I'm not sure the exact amount, but Nathan quoted 22GB of memory (not sure what this 22GB refers to). I think you might be able to get around this by setting the JVM memory settings really high, but we haven't configured Solo to do that by default.
We have disabled our E2E tests involving solo node add
until this is resolved in a patch. I'm reaching out to find an issue that we can use to track this with.
To Reproduce
Initialisation steps from https://github.com/hashgraph/solo/issues/727 and:
Describe the bug
I've seen fails with the error from https://github.com/hashgraph/solo/issues/727 as well.
Describe the expected behavior
Node added and functioning. Does not happen every time but still it is not consistent for testing from our side.
Whole JUnit/CLI Logs
Additional Context
No response