Open ExpatUK opened 1 week ago
So it seems this does function, but the zvol/disk device can only be defined under lxd_instance, not lxd_profile.
Hi could you share the configuration of the entire lxd_profile
/lxd_instance
?
Sure.
lxd_profile:
resource "lxd_profile" "int-dev-zvol" {
remote = var.lxd_remote
name = "int-dev-zvol"
config = {
"user.user-data" = <<-EOF
#cloud-config
users:
- name: ${var.lxd_user}
no-log-init: true
ssh-authorized_keys:
- ${file(var.ssh_pub_key)}
ssh_pwauth: True
EOF
"boot.autostart" = true
"security.privileged" = true
"security.nesting" = true
"security.syscalls.intercept.mknod" = true
"security.syscalls.intercept.setxattr" = true
"security.syscalls.intercept.sysinfo" = true
"linux.kernel_modules" = var.lxd_kmods
"limits.cpu" = "4"
"limits.memory" = "4GiB"
"limits.memory.swap" = false
"limits.memory.enforce" = "hard"
"security.protection.delete" = false
#"zfs.block_mode" = true
}
lifecycle {
ignore_changes = all
}
}
lxd_instance:
resource "lxd_instance" "int-dev-zvol" {
for_each = var.instances.zvol
name = each.key
image = var.lxd_image
remote = var.lxd_remote
target = each.value.target
profiles = ["default", lxd_profile.int-dev-zvol.name]
type = each.value.type != null ? each.value.type : null
config = {
"user.access_interface" = each.value.type == "virtual-machine" ? var.lxd_vm_interface : var.lxd_container_interface
}
device {
name = var.lxd_container_interface
type = "nic"
properties = {
"nictype" = var.lxd_nic_type
"parent" = var.lxd_host_interface
}
}
device {
name = "root"
type = "disk"
properties = {
"path" = "/"
"pool" = "local"
"size" = "200GiB"
"initial.zfs.block_mode" = "true"
}
}
connection {
type = "ssh"
user = var.lxd_user
private_key = file(var.ssh_private_key)
host = self.ipv4_address
timeout = var.ssh_timeout
}
provisioner "remote-exec" {
inline = ["echo 'SSH connection success'"]
}
}
It does not work in the profile because initial.*
options are disk device options.
Here is an example where initial.zfs.block_mode
is set to the root disk device within a profile int-dev-zvol
:
resource "lxd_storage_pool" "zfs" {
name = "zfs"
driver = "zfs"
}
resource "lxd_profile" "int-dev-zvol" {
name = "int-dev-zvol"
device {
name = "root"
type = "disk"
properties = {
path = "/"
pool = "zfs"
"initial.zfs.block_mode" = true
}
}
}
To verify:
$ lxc profile show int-dev-zvol
name: int-dev-zvol
description: ""
config: {}
devices:
root:
initial.zfs.block_mode: "true"
path: /
pool: zfs
type: disk
used_by: []
$ lxc launch images:alpine/edge test -p int-dev-zvol
$ lxc storage volume show zfs container/test
name: test
description: ""
type: container
pool: zfs
content_type: filesystem
project: default
location: none
created_at: 2024-09-19T14:42:22.437599342Z
config:
block.filesystem: ext4
block.mount_options: discard
volatile.uuid: c52cbe59-0f91-49d3-89fa-619946e68e0d
zfs.block_mode: "true" # <---
used_by:
- /1.0/instances/test
Interesting. It was with that exact device
block I received the error, when using it in lxd_profile
.
As it is working (and we do not wish to manage lxd_storage_pools within terraform), I consider this resolved - but it would be nice to have a working example of a secondary block device that could be mounted within the container, i.e. /var/lib/docker.
we do not wish to manage lxd_storage_pools within terraform
The initial.*
settings may prove useful in such case, as they allow configuring new instances with different volume settings without changing those already applied on the storage pool.
it would be nice to have a working example of a secondary block device that could be mounted within the container
Of course, great idea.
Would you find something like the following example useful in the lxd_volume
section?
# Create storage pool.
resource "lxd_storage_pool" "zfs" {
name = "zfs"
driver = "zfs"
}
# Create custom volume.
resource "lxd_volume" "inst-vol" {
name = "second-vol"
pool = lxd_storage_pool.zfs.name
config = {
size = "5GiB"
}
}
# Create an instance with attached custom volume.
resource "lxd_instance" "inst" {
name = "inst"
image = "ubuntu:22.04"
# Attach additional volume.
device {
name = "vol-01"
type = "disk"
properties = {
path = "/var/lib/docker"
pool = lxd_storage_pool.zfs.name
source = lxd_volume.inst-vol.name
}
}
}
We should also emphasize the restrictions that apply when attaching custom volumes to containers and/or VMs (https://documentation.ubuntu.com/lxd/en/latest/howto/storage_volumes/#attach-the-volume-to-an-instance).
I'm correct in thinking that this would only work if the storage pool is imported (or in our clustered environment, managed after creation) within tf?
I'm not sure I understand the question. Attaching custom volumes should work both with pre-existing storage pools, and storage pools that created via TF
Sorry, I should have clarified.
Our current lxd infrastructure works as is without importing or managing the storage pools within terraform.
As far as I understand it, in order to mount this secondary volume to /var/lib/docker, we would be required to import or create a storage pool by defining a lxd_storage_pool
block?
Ahh, got you. You can use an existing storage pool (not managed by Terraform) by referencing it by its name:
# Create storage pool.
lxc storage create tmp zfs
# Create custom volume.
resource "lxd_volume" "inst-vol" {
name = "second-vol"
pool = "tmp" # <-- Reference storage pool by name.
config = {
size = "5GiB"
}
}
# Create an instance with attached custom volume.
resource "lxd_instance" "inst" {
name = "inst"
image = "ubuntu:22.04"
# Attach additional volume.
device {
name = "vol-01"
type = "disk"
properties = {
path = "/var/lib/docker"
pool = "tmp" # <-- Reference storage pool by name.
source = lxd_volume.inst-vol.name
}
}
}
Perfect. I'll give this a try tomorrow and let you know the result. Thank you!
Unfortunately, using that example results in this error:
lxd_volume.dockerzvol: Creating...
lxd_profile.int-dev-zvol: Creating...
lxd_profile.int-dev-zvol: Creation complete after 3s [name=int-dev-zvol]
lxd_volume.dockerzvol: Still creating... [10s elapsed]
lxd_volume.dockerzvol: Creation complete after 11s [name=dockerzvol]
lxd_instance.int-dev-zvol["int-dev-zvol01"]: Creating...
╷
│ Error: Failed to create instance "int-dev-zvol01"
│
│ with lxd_instance.int-dev-zvol["int-dev-zvol01"],
│ on main.tf line 9, in resource "lxd_instance" "int-dev-zvol":
│ 9: resource "lxd_instance" "int-dev-zvol" {
│
│ Failed instance creation: Failed creating instance record: Failed initialising instance: Failed add validation for device
│ "dockerzvol": Failed loading custom volume: Storage volume not found
╵
main.tf:
resource "lxd_volume" "dockerzvol" {
name = "dockerzvol"
pool = "local"
config = {
size = "200GiB"
}
}
resource "lxd_instance" "int-dev-zvol" {
for_each = var.instances.zvol
name = each.key
image = var.lxd_image
remote = var.lxd_remote
target = each.value.target
profiles = ["default", lxd_profile.int-dev-zvol.name]
type = each.value.type != null ? each.value.type : null
config = {
"user.access_interface" = each.value.type == "virtual-machine" ? var.lxd_vm_interface : var.lxd_container_interface
}
device {
name = var.lxd_container_interface
type = "nic"
properties = {
"nictype" = var.lxd_nic_type
"parent" = var.lxd_host_interface
}
}
device {
name = "dockerzvol"
type = "disk"
properties = {
"path" = "/var/lib/docker"
"pool" = "local"
"source" = lxd_volume.dockerzvol.name
}
}
connection {
type = "ssh"
user = var.lxd_user
private_key = file(var.ssh_private_key)
host = self.ipv4_address
timeout = var.ssh_timeout
}
provisioner "remote-exec" {
inline = ["echo 'SSH connection success'"]
}
}
Hmm, I've tried both with VMs and containers with the following example:
lxc storage create local zfs
resource "lxd_volume" "dockerzvol" {
name = "dockerzvol"
pool = "local"
config = {
size = "5GiB"
}
}
resource "lxd_instance" "int-dev-zvol" {
name = "tf-test"
image = "images:alpine/edge"
profiles = ["default"]
type = "container" # "virtual-machine"
config = {
"security.secureboot" = false
}
device {
name = "dockerzvol"
type = "disk"
properties = {
"path" = "/var/lib/docker"
"pool" = "local"
"source" = lxd_volume.dockerzvol.name
}
}
}
Can you provide me with the LXD version and provider version?
What is the output of lxc storage show local
?
Can you also confirm that the custom storage volume is created (lxc storage volume show local dockerzvol
)? The error indicates that the storage volume cannot be found.
If you are running in clustered mode, can you set the target
on both volume and an instance pointing to a specific node?
Versions: lxd 5.19 lxd provider 2.3.0 terraform 1.5.3
lxc storage show local:
config: {}
description: ""
name: local
driver: zfs
used_by:
- /1.0/images/27d9d0f4f2ef30dc2f955dbba1823eb879ba02d27a39ff1deb107ba9246dad22?target=host01
- /1.0/images/27d9d0f4f2ef30dc2f955dbba1823eb879ba02d27a39ff1deb107ba9246dad22?target=host02
- /1.0/images/27d9d0f4f2ef30dc2f955dbba1823eb879ba02d27a39ff1deb107ba9246dad22?target=host03
- /1.0/images/27d9d0f4f2ef30dc2f955dbba1823eb879ba02d27a39ff1deb107ba9246dad22?target=host04
- /1.0/instances/int-dev-swarm01
...
- /1.0/instances/tools03
- /1.0/profiles/ext-dev-swarm
- /1.0/profiles/ext-dev-worker
- /1.0/profiles/default
- /1.0/profiles/int-dev-lab
- /1.0/profiles/int-prd-lb
- /1.0/storage-pools/local/volumes/image/427083a0078160c5e51234b7a6a160a772eb562edae4e72b7ebe2b53d5a2e59?target=host01
- /1.0/storage-pools/local/volumes/image/427083a0078160c5e51234b7a6a160a772eb562edae4e72b7ebe2b53d5a2e59?target=host02
- /1.0/storage-pools/local/volumes/image/427083a0078160c5e51234b7a6a160a772eb562edae4e72b7ebe2b53d5a2e59?target=host03
- /1.0/storage-pools/local/volumes/image/427083a0078160c5e51234b7a6a160a772eb562edae4e72b7ebe2b53d5a2e59?target=host04
...
status: Created
locations:
- host04
- host02
- host03
- host01
lxc storage volume show local dockerzvol:
config:
size: 200GiB
description: ""
name: dockerzvol
type: custom
used_by: []
location: host01
content_type: block
project: default
created_at: 2024-09-20T08:53:13.98321981Z
So it does appear the volume has been created, just on the default remote which is the wrong host for this container (the target for this test container was set to host04). After setting target = host04
in the device block, this successfully gets created where it should, however it appears to have created a dataset rather than a zvol:
lxc storage volume show local dockerzvol:
config:
size: 200GiB
volatile.idmap.last: '[]'
volatile.idmap.next: '[]'
description: ""
name: dockerzvol
type: custom
used_by:
- /1.0/instances/int-dev-zvol01
location: host04
content_type: filesystem # <--
project: default
created_at: 2024-09-20T10:22:37.22770819Z
as is reflected in a zfs list
on host04:
rpool/lxd/custom/default_dockerzvol 376K 200G 376K legacy
When running with initial.zfs.block_mode = true
in the device properties, we get:
╷
│ Error: Failed to create instance "int-dev-zvol01"
│
│ with lxd_instance.int-dev-zvol["int-dev-zvol01"],
│ on main.tf line 10, in resource "lxd_instance" "int-dev-zvol":
│ 10: resource "lxd_instance" "int-dev-zvol" {
│
│ Failed instance creation: Failed creating instance record: Failed initialising instance: Failed add validation for device
│ "dockerzvol": Non-root disk device cannot contain initial.* configuration
╵
And when specifying content_type = block
in the volume block, we get:
╷
│ Error: Failed to create instance "int-dev-zvol01"
│
│ with lxd_instance.int-dev-zvol["int-dev-zvol01"],
│ on main.tf line 11, in resource "lxd_instance" "int-dev-zvol":
│ 11: resource "lxd_instance" "int-dev-zvol" {
│
│ Failed instance creation: Failed creating instance record: Failed initialising instance: Failed add validation for device
│ "dockerzvol": Custom block volumes cannot be used on containers
╵
When running with initial.zfs.block_mode = true in the device properties
Initial settings only apply to instance root disk volume, not the custom volume.
And when specifying content_type = block in the volume block
You can set the content-type of the custom volume to block
, but block volumes cannot be attached to containers, as explained under restrictions in Attach the volume to an instance:
Custom storage volumes of content type block or iso cannot be attached to containers, but only to virtual machines.
So it does appear the volume has been created, just on the default remote which is the wrong host for this container (the target for this test container was set to host04).
Yes, this is an issue. We either need to figure out a way to create custom volume on the same cluster member as an instance or, for now, at least clarify that either target has to be specified or remote storage, such as Ceph, should be used.
So IIUC, currently the only way to have a separate ext4 zvol for containers running docker is to make the entire root device a zvol or manually create and mount a block device after creation.
Edit: As this seems to be an lxd limitation, will it be possible to create this volume as a 'container' type in the future?
Thanks for all your help!
Small update, although the following main.tf works perfectly fine when the target is specified to host04, when it reverts to default (host01) it fails, again unable to find the volume.
main.tf
resource "lxd_instance" "int-dev-zvol" {
for_each = var.instances.timer
name = each.key
image = var.lxd_image
remote = var.lxd_remote
target = each.value.target
profiles = ["default", lxd_profile.int-dev-zvol.name]
type = each.value.type != null ? each.value.type : null
config = {
"user.access_interface" = each.value.type == "virtual-machine" ? var.lxd_vm_interface : var.lxd_container_interface
}
device {
name = "root"
type = "disk"
properties = {
"path" = "/"
"pool" = "local"
"size" = "200GiB"
## "size" = var.lxd_docker_zvol_size
"initial.zfs.block_mode" = "true"
}
}
device {
name = var.lxd_container_interface
type = "nic"
properties = {
"nictype" = var.lxd_nic_type
"parent" = var.lxd_host_interface
}
}
connection {
type = "ssh"
user = var.lxd_user
private_key = file(var.ssh_private_key)
host = self.ipv4_address
timeout = var.ssh_timeout
}
provisioner "remote-exec" {
inline = ["echo 'SSH connection success'"]
}
}
It doesn't seem to respect the instance target, but I'm not entirely sure why it works on host04.
Edit: It doesn't appear to be making the volume at all on other targets, bizarrely. lxc storage volume show local container/mx-dev-zvol01:
Error: Storage pool volume not found
tf apply:
╷
│ Error: Failed to create instance "int-dev-zvol01"
│
│ with lxd_instance.int-dev-zvol["int-dev-zvol01"],
│ on main.tf line 2, in resource "lxd_instance" "int-dev-zvol":
│ 2: resource "lxd_instance" "int-dev-zvol" {
│
│ Failed instance creation: Failed creating instance from image: Could not locate a zvol for rpool/lxd/containers/int-dev-zvol01
╵
On an entirely new platform (2 managers and 4 workers to replicate a swarm, assigned incrementally to hosts so that worker01 = host01) configured in exactly the same way, tf apply bombs out with the following:
╷
│ Error: Failed to create instance "int-dev-manager02"
│
│ with lxd_instance.int-dev-manager["int-dev-manager02"],
│ on main.tf line 2, in resource "lxd_instance" "int-dev-manager":
│ 2: resource "lxd_instance" "int-dev-manager" {
│
│ Failed instance creation: Failed creating instance from image: Could not locate a zvol for rpool/lxd/containers/int-dev-manager02
╵
╷
│ Error: Failed to create instance "int-dev-manager01"
│
│ with lxd_instance.int-dev-manager["int-dev-manager01"],
│ on main.tf line 2, in resource "lxd_instance" "int-dev-manager":
│ 2: resource "lxd_instance" "int-dev-manager" {
│
│ Failed instance creation: Failed creating instance from image: Could not locate a zvol for rpool/lxd/containers/int-dev-manager01
╵
╷
│ Error: Failed to create instance "int-dev-worker03"
│
│ with lxd_instance.int-dev-worker["int-dev-worker03"],
│ on main.tf line 53, in resource "lxd_instance" "int-dev-worker":
│ 53: resource "lxd_instance" "int-dev-worker" {
│
│ Failed instance creation: Failed creating instance from image: Could not locate a zvol for rpool/lxd/containers/int-dev-worker03
╵
╷
│ Error: Failed to create instance "int-dev-worker01"
│
│ with lxd_instance.int-dev-worker["int-dev-worker01"],
│ on main.tf line 53, in resource "lxd_instance" "int-dev-worker":
│ 53: resource "lxd_instance" "int-dev-worker" {
│
│ Failed instance creation: Failed creating instance from image: Could not locate a zvol for rpool/lxd/containers/int-dev-worker01
╵
╷
│ Error: Failed to create instance "int-dev-worker02"
│
│ with lxd_instance.int-dev-worker["int-dev-worker02"],
│ on main.tf line 53, in resource "lxd_instance" "int-dev-worker":
│ 53: resource "lxd_instance" "int-dev-worker" {
│
│ Failed instance creation: Failed creating instance from image: Could not locate a zvol for rpool/lxd/containers/int-dev-worker02
Note the absence of worker04.
lxc storage volume outputs:
root@host04:~# lxc storage volume show local container/int-dev-worker01
Error: Storage pool volume not found
root@host04:~# lxc storage volume show local container/int-dev-worker02
Error: Storage pool volume not found
root@host04:~# lxc storage volume show local container/int-dev-worker03
Error: Storage pool volume not found
root@host04:~# lxc storage volume show local container/int-dev-worker04
config:
block.filesystem: ext4
block.mount_options: discard
zfs.block_mode: "true"
description: ""
name: int-dev-worker04
type: container
used_by:
- /1.0/instances/int-dev-worker04
location: host04
content_type: filesystem
project: default
created_at: 2024-09-20T16:49:38.333640012Z
Relevant TF_LOG=debug output:
2024-09-20T18:02:22.044+0100 [ERROR] provider.terraform-provider-lxd_v2.3.0: Response contains error diagnostic: diagnostic_detail="Failed instance creation: Failed creating instance from image: Could not locate a zvol for rpool/lxd/containers/int-dev-manager01" diagnostic_severity=ERROR diagnostic_summary="Failed to create instance "int-dev-manager01"" tf_proto_version=6.6 tf_rpc=ApplyResourceChange tf_req_id=7ac2f8a3-3dbc-0e05-edcb-c6b639747152 tf_resource_type=lxd_instance @caller=github.com/hashicorp/terraform-plugin-go@v0.23.0/tfprotov6/internal/diag/diagnostics.go:58 @module=sdk.proto tf_provider_addr=registry.terraform.io/terraform-lxd/lxd timestamp=2024-09-20T18:02:22.044+0100
2024-09-20T18:02:22.046+0100 [DEBUG] State storage *remote.State declined to persist a state snapshot
2024-09-20T18:02:22.046+0100 [ERROR] vertex "lxd_instance.int-dev-manager[\"int-dev-manager01\"]" error: Failed to create instance "int-dev-manager01"
Could not locate a zvol for rpool/lxd/containers/int-dev-manager01
Yes, this was an issue in LXD where it could happen that udev rules were not applied in time (especially on loaded system). This has been fixed in 5.21
and 6.1
(now the the zvol is waited for up to 30 seconds to appear).
Since you are using 5.19
which is no longer supported, I would recommend upgrading to 5.21
(LTS) if possible.
Excellent, thanks for letting me know. I'll upgrade the cluster next week and we'll go from there. Do you have a link to the issue, out of interest?
Have a good weekend!
I think the issue was detected in our test suite, but here is the PR: https://github.com/canonical/lxd/pull/13656 and https://github.com/canonical/lxd/pull/13861
Have a nice weekend as well :)
I can confirm that after updating to 5.21.2 LTS, the profile works as expected :) thank you!
Regarding 'container' type volumes (or anything other than the 'custom' type), is it on the roadmap for our usecase to be achievable? aka, attaching a standard container with a zvol block volume mounted to /var/lib/docker, configured and deployed with this provider?
Great, I'm glad the upgrade solve the issue for you :)
I'm not sure why block volumes cannot be attached to the container (there must be a good reason), but have internally suggested to add support for it. If this gets supported by LXD it will be possible in the Terraform provider as well. However, can't promise anything.
Will close this issue, as zfs.block_mode
and initial.zfs.block_mode
are working fine.
Hi @ExpatUK,
I've discussed internally and block devices cannot be attached to containers directly as already mentioned previously, however, you can attach a block backed device to a container if it has a filesystem on top of it.
So to create a zvol with ext4 and attach it to the container, you can do the following:
lxc storage create mypool zfs
lxc storage volume create mypool test zfs.block_mode=true
lxc config device add v1 myvol disk path=/var/lib/docker pool=mypool source=test
To verify:
zfs list -t volume
NAME USED AVAIL REFER MOUNTPOINT
mypool/custom/default_test 136K 28.6G 136K -
In Terraform configuration, this would look like this:
resource "lxd_volume" "dockerzvol" {
name = "dockerzvol"
pool = "local"
# Setting content_type to "filesystem" (default value), creates a filesystem.
#
# Setting content_type to "block", creates a block device without FS.
# Those volumes can be attached only to virtual machines.
content_type = "filesystem"
config = {
# Set zfs.block_mode to true to create block backed device.
"zfs.block_mode" = true
size = "5GiB"
}
}
resource "lxd_instance" "int-dev-zvol" {
name = "tf-test"
image = "images:alpine/edge"
profiles = ["default"]
type = "container"
device {
name = "dockerzvol"
type = "disk"
properties = {
"path" = "/var/lib/docker"
"pool" = "local"
"source" = lxd_volume.dockerzvol.name
}
}
}
Hi Din,
Thanks for getting back to me. We’ve been doing something similar already for our existing docker swarm clusters but it’s interesting to see that TF config.
I’m on annual leave now until next week, but I’ll give that a try on Tuesday and let you know how it looks for us.
Perfect, thanks :)
Morning Din,
I can confirm with the above code plus a minor tweak (specifying target), this now works as expected.
resource "lxd_volume" "dockerzvol" {
name = "dockerzvol"
pool = "local"
target = "host04" # <-- this must still be specified
# Setting content_type to "filesystem" (default value), creates a filesystem.
#
# Setting content_type to "block", creates a block device without FS.
# Those volumes can be attached only to virtual machines.
content_type = "filesystem"
config = {
# Set zfs.block_mode to true to create block backed device.
"zfs.block_mode" = true
size = "200GiB"
}
}
# and under the instance...
device {
name = "dockerzvol"
type = "disk"
properties = {
"path" = "/var/lib/docker"
"pool" = "local"
"source" = lxd_volume.dockerzvol.name
}
}
# which results in:
root@int-dev-zvol01:~# df -lhT
Filesystem Type Size Used Avail Use% Mounted on
rpool/lxd/containers/int-dev-zvol01 zfs 4.1T 942M 4.1T 1% /
none tmpfs 492K 4.0K 488K 1% /dev
tmpfs tmpfs 100K 0 100K 0% /dev/lxd
/dev/zvol/rpool/lxd/custom/default_dockerzvol ext4 196G 28K 186G 1% /var/lib/docker
Thanks for all the help and guidance with this - I'd have thought this would be a reasonably common usecase/scenario enough for the docs.
Thanks for the feedback, I agree that we should include such an example in the documentation.
Hi,
I've been trying to implement this for a couple of containers that will be hosting docker swarm services to try and eliminate the IO bottleneck on ZFS, however using this device block:
Results in the following errors:
This does appear to create the zvol block devices as desired, however:
(The 'legacy' ones are normal datasets, and lab04 has a manually created, attached and functional ext4 block device)
Am I omitting something basic? Essentially I just need the provider to create zvol block devices for specific container instances rather than zfs datasets when specified.
If I manually set the pool to use zfs.block_mode this works without issue, without the disk device being specified.
Versions: lxd 5.19 lxd provider 2.3.0 terraform 1.5.3