terraform-lxd / terraform-provider-lxd

LXD Resource provider for Terraform
https://registry.terraform.io/providers/terraform-lxd/lxd/latest/docs
Mozilla Public License 2.0
255 stars 71 forks source link

LXD remote doesn't accept certificate during token authentication with error tls: bad certificate #503

Open kevindu opened 1 month ago

kevindu commented 1 month ago

Setup: running the terraform provider lxd inside a docker executor of gitlab runner, where the docker executor is an alpine linux image, and using the token to authenticate with lxd remote.

Firstly, setting the generate_client_certificates as true to generate certificate automatically ended up with following error.

Unable to create server client for remote "cambridge-mgmt": Unable to │ authenticate with remote server: not authorized

By monitoring the lxd remote the following error exists.

` location: none metadata: context: {} level: info message: 'http: TLS handshake error from 10.255.77.9:41274: remote error: tls: bad certificate' timestamp: "2024-07-23T15:48:31.575405614Z" type: logging

location: none metadata: context: {} level: info message: 'http: TLS handshake error from 10.255.77.9:41276: remote error: tls: bad certificate' timestamp: "2024-07-23T15:48:31.578354958Z" type: logging `

Checked the generated certificate manually and found the subject alternative name (SAN) missing, as below.

-----BEGIN CERTIFICATE----- MIIB/jCCAYSgAwIBAgIQFYCVcQlvIKBs/oFfbV7vizAKBggqhkjOPQQDAzBHMQww CgYDVQQKEwNMWEQxNzA1BgNVBAMMLnJvb3RAcnVubmVyLXhjZWMzenNweS1wcm9q ZWN0LTQ5My1jb25jdXJyZW50LTAwHhcNMjQwNzIyMjMzODEwWhcNMzQwNzIwMjMz ODEwWjBHMQwwCgYDVQQKEwNMWEQxNzA1BgNVBAMMLnJvb3RAcnVubmVyLXhjZWMz enNweS1wcm9qZWN0LTQ5My1jb25jdXJyZW50LTAwdjAQBgcqhkjOPQIBBgUrgQQA IgNiAAStPWmUG5YDd1oZe2ug2qIecFHFLruxw1RB+9o6TTAr7mFSbTcTFcrHNbld 9P7OVfW13zkfb+w9F5ORUuvJX8LwZZIk38RQVpB1BLNY4wmknRmC/rFqsD50ag66 b9ZBWlijNTAzMA4GA1UdDwEB/wQEAwIFoDATBgNVHSUEDDAKBggrBgEFBQcDAjAM BgNVHRMBAf8EAjAAMAoGCCqGSM49BAMDA2gAMGUCMQDNWMsZeiyb/MFrnAN6uZIb GTocoKq0IgG+3yfkjEbqSMDXFGiE5K4iY2OL0Pt5mxACMAZjeYqlCoEe+6J9FIZc NuSDSes4Y9PUJoKHYu0ErOgIUUEh1y/TlTvW2rkNKc0tYg== -----END CERTIFICATE-----

Suspected the missing SAN was the issue, although cannot see why the SAN is missing, so turned off the generate_client_certificates flag and generated the private key and certificate manually, which are placed as client.crt and client.key files under ~/.config/lxc directory. However, still failed with the same error.

As a workaround, now I have gave up using the token to authenticate. Instead adding the same certificate directly to the lxd remote via CLI works properly.

Once the certificate is added successfully as above, the provider lxd inside the gitlab runner's docker executor works properly. So the issue seems to happen only during adding the certificate to lxd remote via token. Could anyone suggest how to further understand the real reason behind "bad certificate"? At the moment neither lxd provider nor lxd remote server provided any useful debug information.

I guess when you add the certificate directly on server via CLI it is not validated in the same way as adding it via token authentication, because I found any bad certificate returned by token authentication could be added into lxd server via CLI successfully.

Thanks.

MusicDin commented 1 month ago

Hi, could you share the provider's terraform configuration (provider "lxd" {...})?

kevindu commented 1 month ago

Hi, could you share the provider's terraform configuration (provider "lxd" {...})?

` provider "lxd" { generate_client_certificates = false accept_remote_certificate = true

remote { name = "cambridge-mgmt" address = "10.255.77.11" port = "8443" default = true token = var.LXD_TOKEN } } `

MusicDin commented 1 month ago

Hmm, client certificates are generated by the LXD client, not the Terraform provider. Which LXD version is being used?

kevindu commented 1 month ago

Hmm, client certificates are generated by the LXD client, not the Terraform provider. Which LXD version is being used?

Do you mean the certificate should be generated by LXD client instead of Terraform provider LXD? Or did you just comment on the terraform provider code above?

Initially I kept generate_client_certificates = true without installing any LXD client, I found a certificate was generated, so I assume the terraform provider LXD could generate certificate without any LXD client required.

Later, I attempted to install a LXD client (apk add lxd lxd-client lxcfs dbus) in Apline docker image, but I found the generated certificate stays the same output.

Finally, I put generate_client_certificates = false and generated the certificate manually via openssl, and placed it under the dir ~/.config/lxc, but terraform still returned the same error bad certificate.

MusicDin commented 1 month ago

Do you mean the certificate should be generated by LXD client instead of Terraform provider LXD? Or did you just comment on the terraform provider code above?

Just commented that terraform provider relies on (Golang) LXD client for generating client certificates. Therefore, this could be an issue with LXD client.

I've tried reproducing this issue but without success (although on Ubuntu not Alpine - will try that as well).

Which LXD version is being used? I suppose LXD is not used within Snap, since ~/.config/lxc is being used - or is configDir overwritten?

kevindu commented 1 week ago

Sorry for the late response.

Just commented that terraform provider relies on (Golang) LXD client for generating client certificates. Therefore, this could be an issue with LXD client.

In the Alpine image I don't believe any LXD client exists, but a certificate was still generated, which has been wired.

Which LXD version is being used? I suppose LXD is not used within Snap, since ~/.config/lxc is being used - or is configDir overwritten?

I tried both setup without and with LXD installed ( manual install: apk add lxd lxd-client lxcfs dbus, sorry don't have record of LXD version) but issue remained the same.

We have worked around this by using inserting certificate directly instead of token authentication, so it's not a blocker anymore for us. However, something isn't quite right on non-ubuntu/non-snap environment.

We did consider to use Ubuntu docker image instead but there are some other dependencies which have been built in the apline image so it's not an easy move.

Thanks for your support.