threefoldtech / tfgrid-sdk-go

Apache License 2.0
2 stars 4 forks source link

🐞 [Bug]: failed to deploy network on node 14 #1257

Open samaradel opened 2 weeks ago

samaradel commented 2 weeks ago

What happened?

As per TC1492 I couldn't deploy VM.

which network/s did you face the problem on?

Dev

Relevant log output

FTL error="failed to deploy network on node 14: could not reach node 14: context deadline exceeded"
Eslam-Nawara commented 1 week ago

Tried deploying a vm on node 14 on dev net and was able to deploy and get the vm normally

10:56AM INF starting peer session=tf-12380 twin=4653
10:56AM INF deploying network
10:56AM INF deploying vm
10:56AM INF vm planetary ip: 300:e9c4:9048:57cf:cb9a:bb4d:24eb:a0f5
10:56AM INF vm mycelium ip: 4ca:44dc:369f:1549:ff0f:ca5e:1fde:7885
➜  grid-cli git:(development) go run main.go get vm test
10:57AM INF starting peer session=tf-12544 twin=4653
10:57AM INF vm:
{
        "Name": "test",
        "NodeID": 14,
        "SolutionType": "vm/test",
        "SolutionProvider": null,
        "NetworkName": "testnetwork",
        "Disks": [],
        "Zdbs": [],
        "Vms": [
                {
                        "name": "test",
                        "node": 14,
                        .
                        .
                        .
                        .
                        "planetary_ip": "300:e9c4:9048:57cf:cb9a:bb4d:24eb:a0f5",
                        "mycelium_ip": "4ca:44dc:369f:1549:ff0f:ca5e:1fde:7885",
                        "console_url": "10.20.2.1:20002"
                }
        ],
        "VmsLight": [],
        "QSFS": [],
        "Volumes": [],
        "NodeDeploymentID": {
                "14": 171062
        },
        "ContractID": 171062,
        "IPrange": "10.20.2.0/24"
}
➜  grid-cli git:(development)
Eslam-Nawara commented 1 week ago

After the fix I was able to deploy a vm on node 14 without the need to add ygg

➜  grid-cli git:(development) ✗ go run main.go deploy vm --name test2 --ssh ~/.ssh/id_rsa.pub --node 14
1:00PM INF starting peer session=tf-29157 twin=4653
1:00PM INF deploying network
1:00PM INF deploying vm
1:00PM INF vm mycelium ip: 447:b236:8c3:c2f5:ff0f:2ad3:ddcd:22f8
➜  grid-cli git:(development) ✗ go run main.go get vm test2
1:01PM INF starting peer session=tf-29338 twin=4653
1:01PM INF vm:
{
        "Name": "test2",
        "NodeID": 14,
        "SolutionType": "vm/test2",
        "SolutionProvider": null,
        "NetworkName": "test2network",
        "Disks": [],
        "Zdbs": [],
        "Vms": [
                {
                        "name": "test2",
                        "node": 14,
                        .
                        .
                        .
                        "planetary_ip": "",
                        "mycelium_ip": "447:b236:8c3:c2f5:ff0f:2ad3:ddcd:22f8",
                        "console_url": "10.20.2.1:20002"
                }
        ],
        "VmsLight": [],
        "QSFS": [],
        "Volumes": [],
        "NodeDeploymentID": {
                "14": 171091
        },
        "ContractID": 171091,
        "IPrange": "10.20.2.0/24"
}
➜  grid-cli git:(development) ✗
samaradel commented 1 week ago

why it doesn't work with tfcmd

image

Eslam-Nawara commented 1 week ago

@samaradel what version are you using?

samaradel commented 1 week ago

v0.16.0

Eslam-Nawara commented 1 week ago

Was able to reproduce the issue, tried to deploy a vm with --cpu 2 --memory 4 --disk 10, and I had the same error the connection is timing out image

tried to deploy the same vm with same resources in a specific node 171 and I was able to deploy it successfully, so I suspect that this specific node 259 has a problem. image

Eslam-Nawara commented 1 week ago

the node is now responsive and I was able to deploy a vm on it with no problems image

samaradel commented 1 week ago

tested and it works, thanks :)