Open petterroea opened 6 months ago
It's probably due to using wrong ip for the node
? Could you let TF wait until the bootstrap times out, or you could set a small timeout and gather the error. Without error there's not much info to proceed.
Hello, thank you for your prompt response.
I let TF wait for the bootstrap to time out, here is the result:
talos_machine_bootstrap.bootstrap: Still creating... [9m40s elapsed]
2024-04-01T09:00:02.198Z [TRACE] dag/walk: vertex "root" is waiting for "provider[\"registry.terraform.io/siderolabs/talos\"] (close)"
2024-04-01T09:00:02.199Z [TRACE] dag/walk: vertex "provider[\"registry.terraform.io/siderolabs/talos\"] (close)" is waiting for "talos_machine_bootstrap.bootstrap"
2024-04-01T09:00:02.765Z [DEBUG] provider.terraform-provider-talos_v0.2.0: 2024/04/01 09:00:02 [TRACE] Waiting 10s before next try
2024-04-01T09:00:07.200Z [TRACE] dag/walk: vertex "provider[\"registry.terraform.io/siderolabs/talos\"] (close)" is waiting for "talos_machine_bootstrap.bootstrap"
2024-04-01T09:00:07.200Z [TRACE] dag/walk: vertex "root" is waiting for "provider[\"registry.terraform.io/siderolabs/talos\"] (close)"
talos_machine_bootstrap.bootstrap: Still creating... [9m50s elapsed]
2024-04-01T09:00:12.201Z [TRACE] dag/walk: vertex "root" is waiting for "provider[\"registry.terraform.io/siderolabs/talos\"] (close)"
2024-04-01T09:00:12.201Z [TRACE] dag/walk: vertex "provider[\"registry.terraform.io/siderolabs/talos\"] (close)" is waiting for "talos_machine_bootstrap.bootstrap"
2024-04-01T09:00:12.774Z [DEBUG] provider.terraform-provider-talos_v0.2.0: 2024/04/01 09:00:12 [TRACE] Waiting 10s before next try
2024-04-01T09:00:17.206Z [TRACE] dag/walk: vertex "provider[\"registry.terraform.io/siderolabs/talos\"] (close)" is waiting for "talos_machine_bootstrap.bootstrap"
2024-04-01T09:00:17.206Z [TRACE] dag/walk: vertex "root" is waiting for "provider[\"registry.terraform.io/siderolabs/talos\"] (close)"
2024-04-01T09:00:21.294Z [DEBUG] provider.terraform-provider-talos_v0.2.0: 2024/04/01 09:00:21 [WARN] WaitForState timeout after 10m0s
2024-04-01T09:00:21.294Z [DEBUG] provider.terraform-provider-talos_v0.2.0: 2024/04/01 09:00:21 [WARN] WaitForState starting 30s refresh grace period
2024-04-01T09:00:21.294Z [DEBUG] provider.terraform-provider-talos_v0.2.0: 2024/04/01 09:00:21 [ERROR] Context cancelation detected, abandoning grace period
2024-04-01T09:00:21.294Z [DEBUG] provider.terraform-provider-talos_v0.2.0: Called provider defined Resource Create: tf_provider_addr=registry.terraform.io/siderolabs/talos tf_req_id=a8d8ce18-ae04-db56-bf56-8fffe59fc345 tf_resource_type=talos_machine_bootstrap tf_rpc=ApplyResourceChange @module=sdk.framework @caller=github.com/hashicorp/terraform-plugin-framework@v1.2.0/internal/fwserver/server_createresource.go:98 timestamp=2024-04-01T09:00:21.294Z
2024-04-01T09:00:21.294Z [TRACE] provider.terraform-provider-talos_v0.2.0: Received downstream response: tf_proto_version=6.3 diagnostic_error_count=1 tf_req_id=a8d8ce18-ae04-db56-bf56-8fffe59fc345 tf_req_duration_ms=600008 @caller=github.com/hashicorp/terraform-plugin-go@v0.15.0/tfprotov6/internal/tf6serverlogging/downstream_request.go:37 diagnostic_warning_count=0 tf_provider_addr=registry.terraform.io/siderolabs/talos @module=sdk.proto tf_resource_type=talos_machine_bootstrap tf_rpc=ApplyResourceChange timestamp=2024-04-01T09:00:21.294Z
2024-04-01T09:00:21.294Z [ERROR] provider.terraform-provider-talos_v0.2.0: Response contains error diagnostic: diagnostic_severity=ERROR tf_provider_addr=registry.terraform.io/siderolabs/talos @caller=github.com/hashicorp/terraform-plugin-go@v0.15.0/tfprotov6/internal/diag/diagnostics.go:55 @module=sdk.proto diagnostic_summary="Error bootstrapping node" diagnostic_detail="rpc error: code = Canceled desc = grpc: the client connection is closing" tf_proto_version=6.3 tf_req_id=a8d8ce18-ae04-db56-bf56-8fffe59fc345 tf_rpc=ApplyResourceChange tf_resource_type=talos_machine_bootstrap timestamp=2024-04-01T09:00:21.294Z
2024-04-01T09:00:21.295Z [TRACE] provider.terraform-provider-talos_v0.2.0: Served request: @module=sdk.proto tf_resource_type=talos_machine_bootstrap tf_proto_version=6.3 tf_provider_addr=registry.terraform.io/siderolabs/talos tf_req_id=a8d8ce18-ae04-db56-bf56-8fffe59fc345 tf_rpc=ApplyResourceChange @caller=github.com/hashicorp/terraform-plugin-go@v0.15.0/tfprotov6/tf6server/server.go:829 timestamp=2024-04-01T09:00:21.294Z
2024-04-01T09:00:21.296Z [TRACE] maybeTainted: talos_machine_bootstrap.bootstrap encountered an error during creation, so it is now marked as tainted
2024-04-01T09:00:21.296Z [TRACE] terraform.contextPlugins: Schema for provider "registry.terraform.io/siderolabs/talos" is in the global cache
2024-04-01T09:00:21.296Z [TRACE] NodeAbstractResouceInstance.writeResourceInstanceState to workingState for talos_machine_bootstrap.bootstrap
2024-04-01T09:00:21.296Z [TRACE] NodeAbstractResouceInstance.writeResourceInstanceState: removing state object for talos_machine_bootstrap.bootstrap
2024-04-01T09:00:21.296Z [TRACE] evalApplyProvisioners: talos_machine_bootstrap.bootstrap is tainted, so skipping provisioning
2024-04-01T09:00:21.296Z [TRACE] maybeTainted: talos_machine_bootstrap.bootstrap was already tainted, so nothing to do
2024-04-01T09:00:21.296Z [TRACE] terraform.contextPlugins: Schema for provider "registry.terraform.io/siderolabs/talos" is in the global cache
2024-04-01T09:00:21.296Z [TRACE] NodeAbstractResouceInstance.writeResourceInstanceState to workingState for talos_machine_bootstrap.bootstrap
2024-04-01T09:00:21.296Z [TRACE] NodeAbstractResouceInstance.writeResourceInstanceState: removing state object for talos_machine_bootstrap.bootstrap
2024-04-01T09:00:21.297Z [TRACE] statemgr.Filesystem: have already backed up original terraform.tfstate to terraform.tfstate.backup on a previous write
2024-04-01T09:00:21.302Z [TRACE] statemgr.Filesystem: state has changed since last snapshot, so incrementing serial to 363
2024-04-01T09:00:21.302Z [TRACE] statemgr.Filesystem: writing snapshot at terraform.tfstate
2024-04-01T09:00:21.335Z [ERROR] vertex "talos_machine_bootstrap.bootstrap" error: Error bootstrapping node
2024-04-01T09:00:21.335Z [TRACE] vertex "talos_machine_bootstrap.bootstrap": visit complete, with errors
2024-04-01T09:00:21.335Z [TRACE] dag/walk: upstream of "provider[\"registry.terraform.io/siderolabs/talos\"] (close)" errored, so skipping
2024-04-01T09:00:21.335Z [TRACE] dag/walk: upstream of "root" errored, so skipping
2024-04-01T09:00:21.335Z [TRACE] statemgr.Filesystem: have already backed up original terraform.tfstate to terraform.tfstate.backup on a previous write
2024-04-01T09:00:21.341Z [TRACE] statemgr.Filesystem: state has changed since last snapshot, so incrementing serial to 364
2024-04-01T09:00:21.341Z [TRACE] statemgr.Filesystem: writing snapshot at terraform.tfstate
It is worth noting that if i bootstrap manually, the node in question changes state from booting
to running
immediately, and there is immediately a lot of console output. This never happens when bootstrapping through terraform.
endpoint_ip
should be correct. I am able to ping it from the same box I run terraform on, and this is what the talos monitor shows me:
Not sure why the logs say this:
evalApplyProvisioners: talos_machine_bootstrap.bootstrap is tainted, so skipping provisioning
could you disable debug logs and just share the normal output?
Maybe also try setting node
attribute to same as endpoint_ip
@frezbo Just setting node
to 10.0.5.60
, right? Will run with normal output.
@frezbo Just setting
node
to10.0.5.60
, right? Will run with normal output.
yes
Running.
Thank you for waiting:
talos_machine_secrets.secrets: Creating...
talos_machine_secrets.secrets: Creation complete after 0s [id=machine_secrets]
data.talos_client_configuration.cc: Reading...
data.talos_client_configuration.cc: Read complete after 0s [id=homecluster]
data.talos_machine_configuration.machine_configurations["1"]: Reading...
data.talos_machine_configuration.machine_configurations["0"]: Reading...
data.talos_machine_configuration.machine_configurations["2"]: Reading...
data.talos_machine_configuration.machine_configurations["1"]: Read complete after 0s [id=homecluster]
data.talos_machine_configuration.machine_configurations["0"]: Read complete after 0s [id=homecluster]
data.talos_machine_configuration.machine_configurations["2"]: Read complete after 0s [id=homecluster]
proxmox_vm_qemu.talos-control["2"]: Creating...
proxmox_vm_qemu.talos-control["1"]: Creating...
proxmox_vm_qemu.talos-control["0"]: Creating...
proxmox_vm_qemu.talos-control["2"]: Still creating... [10s elapsed]
proxmox_vm_qemu.talos-control["1"]: Still creating... [10s elapsed]
proxmox_vm_qemu.talos-control["0"]: Still creating... [10s elapsed]
proxmox_vm_qemu.talos-control["1"]: Creation complete after 11s [id=sanae/qemu/104]
proxmox_vm_qemu.talos-control["2"]: Creation complete after 11s [id=yumi/qemu/102]
proxmox_vm_qemu.talos-control["0"]: Creation complete after 11s [id=yumi/qemu/103]
talos_machine_configuration_apply.control_planes["1"]: Creating...
talos_machine_configuration_apply.control_planes["0"]: Creating...
talos_machine_configuration_apply.control_planes["2"]: Creating...
talos_machine_configuration_apply.control_planes["2"]: Still creating... [10s elapsed]
talos_machine_configuration_apply.control_planes["1"]: Still creating... [10s elapsed]
talos_machine_configuration_apply.control_planes["0"]: Still creating... [10s elapsed]
talos_machine_configuration_apply.control_planes["0"]: Still creating... [20s elapsed]
talos_machine_configuration_apply.control_planes["1"]: Still creating... [20s elapsed]
talos_machine_configuration_apply.control_planes["2"]: Still creating... [20s elapsed]
talos_machine_configuration_apply.control_planes["1"]: Still creating... [30s elapsed]
talos_machine_configuration_apply.control_planes["2"]: Still creating... [30s elapsed]
talos_machine_configuration_apply.control_planes["0"]: Still creating... [30s elapsed]
talos_machine_configuration_apply.control_planes["1"]: Creation complete after 32s [id=machine_configuration_apply]
talos_machine_configuration_apply.control_planes["0"]: Creation complete after 32s [id=machine_configuration_apply]
talos_machine_configuration_apply.control_planes["2"]: Creation complete after 32s [id=machine_configuration_apply]
talos_machine_bootstrap.bootstrap: Creating...
talos_machine_bootstrap.bootstrap: Still creating... [10s elapsed]
talos_machine_bootstrap.bootstrap: Still creating... [20s elapsed]
talos_machine_bootstrap.bootstrap: Still creating... [31s elapsed]
talos_machine_bootstrap.bootstrap: Still creating... [41s elapsed]
talos_machine_bootstrap.bootstrap: Still creating... [51s elapsed]
talos_machine_bootstrap.bootstrap: Still creating... [1m1s elapsed]
talos_machine_bootstrap.bootstrap: Still creating... [1m11s elapsed]
talos_machine_bootstrap.bootstrap: Still creating... [1m21s elapsed]
talos_machine_bootstrap.bootstrap: Still creating... [1m31s elapsed]
talos_machine_bootstrap.bootstrap: Still creating... [1m41s elapsed]
talos_machine_bootstrap.bootstrap: Still creating... [1m51s elapsed]
talos_machine_bootstrap.bootstrap: Still creating... [2m1s elapsed]
talos_machine_bootstrap.bootstrap: Still creating... [2m11s elapsed]
talos_machine_bootstrap.bootstrap: Still creating... [2m21s elapsed]
talos_machine_bootstrap.bootstrap: Still creating... [2m31s elapsed]
talos_machine_bootstrap.bootstrap: Still creating... [2m41s elapsed]
talos_machine_bootstrap.bootstrap: Still creating... [2m51s elapsed]
talos_machine_bootstrap.bootstrap: Still creating... [3m1s elapsed]
talos_machine_bootstrap.bootstrap: Still creating... [3m11s elapsed]
talos_machine_bootstrap.bootstrap: Still creating... [3m21s elapsed]
talos_machine_bootstrap.bootstrap: Still creating... [3m31s elapsed]
talos_machine_bootstrap.bootstrap: Still creating... [3m41s elapsed]
talos_machine_bootstrap.bootstrap: Still creating... [3m51s elapsed]
talos_machine_bootstrap.bootstrap: Still creating... [4m1s elapsed]
talos_machine_bootstrap.bootstrap: Still creating... [4m11s elapsed]
talos_machine_bootstrap.bootstrap: Still creating... [4m21s elapsed]
talos_machine_bootstrap.bootstrap: Still creating... [4m31s elapsed]
talos_machine_bootstrap.bootstrap: Still creating... [4m41s elapsed]
talos_machine_bootstrap.bootstrap: Still creating... [4m51s elapsed]
talos_machine_bootstrap.bootstrap: Still creating... [5m1s elapsed]
talos_machine_bootstrap.bootstrap: Still creating... [5m11s elapsed]
talos_machine_bootstrap.bootstrap: Still creating... [5m21s elapsed]
talos_machine_bootstrap.bootstrap: Still creating... [5m31s elapsed]
talos_machine_bootstrap.bootstrap: Still creating... [5m41s elapsed]
talos_machine_bootstrap.bootstrap: Still creating... [5m51s elapsed]
talos_machine_bootstrap.bootstrap: Still creating... [6m1s elapsed]
talos_machine_bootstrap.bootstrap: Still creating... [6m11s elapsed]
talos_machine_bootstrap.bootstrap: Still creating... [6m21s elapsed]
talos_machine_bootstrap.bootstrap: Still creating... [6m31s elapsed]
talos_machine_bootstrap.bootstrap: Still creating... [6m41s elapsed]
talos_machine_bootstrap.bootstrap: Still creating... [6m51s elapsed]
talos_machine_bootstrap.bootstrap: Still creating... [7m1s elapsed]
talos_machine_bootstrap.bootstrap: Still creating... [7m11s elapsed]
talos_machine_bootstrap.bootstrap: Still creating... [7m21s elapsed]
talos_machine_bootstrap.bootstrap: Still creating... [7m31s elapsed]
talos_machine_bootstrap.bootstrap: Still creating... [7m41s elapsed]
talos_machine_bootstrap.bootstrap: Still creating... [7m51s elapsed]
talos_machine_bootstrap.bootstrap: Still creating... [8m1s elapsed]
talos_machine_bootstrap.bootstrap: Still creating... [8m11s elapsed]
talos_machine_bootstrap.bootstrap: Still creating... [8m21s elapsed]
talos_machine_bootstrap.bootstrap: Still creating... [8m31s elapsed]
talos_machine_bootstrap.bootstrap: Still creating... [8m41s elapsed]
talos_machine_bootstrap.bootstrap: Still creating... [8m51s elapsed]
talos_machine_bootstrap.bootstrap: Still creating... [9m1s elapsed]
talos_machine_bootstrap.bootstrap: Still creating... [9m11s elapsed]
talos_machine_bootstrap.bootstrap: Still creating... [9m21s elapsed]
talos_machine_bootstrap.bootstrap: Still creating... [9m31s elapsed]
talos_machine_bootstrap.bootstrap: Still creating... [9m41s elapsed]
talos_machine_bootstrap.bootstrap: Still creating... [9m51s elapsed]
╷
│ Warning: Qemu Guest Agent support is disabled from proxmox config.
│
│ with proxmox_vm_qemu.talos-control["1"],
│ on control-plane.tf line 2, in resource "proxmox_vm_qemu" "talos-control":
│ 2: resource "proxmox_vm_qemu" "talos-control" {
│
│ Qemu Guest Agent support is required to make communications with the VM
│
│ (and 2 more similar warnings elsewhere)
╵
╷
│ Error: Error bootstrapping node
│
│ with talos_machine_bootstrap.bootstrap,
│ on talos_machine_config.tf line 45, in resource "talos_machine_bootstrap" "bootstrap":
│ 45: resource "talos_machine_bootstrap" "bootstrap" {
│
│ rpc error: code = Canceled desc = grpc: the client connection is closing
╵
Here is the relevant plan
diff:
# talos_machine_configuration_apply.control_planes["0"] will be created
+ resource "talos_machine_configuration_apply" "control_planes" {
+ apply_mode = "auto"
+ client_configuration = (known after apply)
+ endpoint = "10.0.5.60"
+ id = (known after apply)
+ machine_configuration = (sensitive value)
+ machine_configuration_input = (sensitive value)
+ node = "10.0.5.60"
}
# talos_machine_configuration_apply.control_planes["1"] will be created
+ resource "talos_machine_configuration_apply" "control_planes" {
+ apply_mode = "auto"
+ client_configuration = (known after apply)
+ endpoint = "10.0.5.61"
+ id = (known after apply)
+ machine_configuration = (sensitive value)
+ machine_configuration_input = (sensitive value)
+ node = "10.0.5.61"
}
# talos_machine_configuration_apply.control_planes["2"] will be created
+ resource "talos_machine_configuration_apply" "control_planes" {
+ apply_mode = "auto"
+ client_configuration = (known after apply)
+ endpoint = "10.0.5.62"
+ id = (known after apply)
+ machine_configuration = (sensitive value)
+ machine_configuration_input = (sensitive value)
+ node = "10.0.5.62"
}
# talos_machine_secrets.secrets will be created
+ resource "talos_machine_secrets" "secrets" {
+ client_configuration = (known after apply)
+ id = (known after apply)
+ machine_secrets = (known after apply)
+ talos_version = "v1.4"
}
Have you verified that time is synchronzed? I encountered this issue when deploying my cluster in WSL. WSL data/time was out of sync with Windows or the actual time. See: https://www.talos.dev/v1.7/talos-guides/configuration/time-sync/
@vdupain time should be synchronized, but i will verify!
Hello
When using this terraform provider, I am unable to bootstrap my cluster. However, if I output the generated talosconfig and use talosctl manually, I am able to bootstrap with no problem. I am running bare-metal on proxmox.
Terraform stuff:
And some variables:
Logs from attempting to bootstrap:
What steps can i take from here to help troubleshoot? This has already cost me a day so I need to get on with running my cluster by manually bootstrapping, but I'd love to help fix this issue.