roots / trellis-cli

A CLI to manage Trellis projects
https://roots.io/trellis/
MIT License
166 stars 25 forks source link

Add LXD VM support #363

Open swalkinshaw opened 1 year ago

swalkinshaw commented 1 year ago

Follow-up to https://github.com/roots/trellis-cli/pull/346 which adds Linux VM support via LXD

MikeiLL commented 1 year ago

This has been making me very happy so far. Worth updating my OSX to Ventura for. One thing to note, unless I missed a Trellis-cli upgrade, trellis ssh development is trying to connect as vagrant, while this Lima box is initialized with $(whoami) as the admin user, so fails as there is no vagrant user.

swalkinshaw commented 1 year ago

@MikeiLL not really relevant to this PR specifically. You probably wanted https://github.com/roots/trellis-cli/pull/346

Regardless, yeah it's a known issue but trellis vm shell is the replacement command since it's VM specific.

alex-galey commented 9 months ago

Thank you @swalkinshaw for this promising feature ! Byebye Virtual Box :)

I gave a try and I had to set the container as a privileged container so it can map 1000 uid to start it :

~/trellis-cli/test.dev$ ../trellis-cli vm start
Running command => lxc launch ubuntu:22.04 test-dev
Creating test-dev
Starting test-dev                           
Error: Failed to run: /usr/bin/lxd forkstart test-dev /var/lib/lxd/containers /var/log/lxd/test-dev/lxc.conf: exit status 1
Try `lxc info --show-log local:test-dev` for more info
Error creating VM.
exit status 1

~/trellis-cli/test.dev$ lxc info --show-log local:test-dev
Name: test-dev
Status: STOPPED
Type: container
Architecture: x86_64
Created: 2023/12/10 15:56 CET
Last Used: 2023/12/10 15:56 CET

Log:

lxc test-dev 20231210145646.668 ERROR    conf - ../src/lxc/conf.c:lxc_map_ids:3704 - newuidmap failed to write mapping "newuidmap: uid range [1000-1001) -> [1000-1001) not allowed": newuidmap 14242 0 165536 1000 1000 1000 1 1001 166537 9999000
lxc test-dev 20231210145646.669 ERROR    start - ../src/lxc/start.c:lxc_spawn:1788 - Failed to set up id mapping.
lxc test-dev 20231210145646.672 ERROR    lxccontainer - ../src/lxc/lxccontainer.c:wait_on_daemonized_start:878 - Received container state "ABORTING" instead of "RUNNING"
lxc test-dev 20231210145646.681 ERROR    start - ../src/lxc/start.c:__lxc_start:2107 - Failed to spawn container "test-dev"
lxc test-dev 20231210145646.682 WARN     start - ../src/lxc/start.c:lxc_abort:1036 - No such process - Failed to send SIGKILL via pidfd 17 for process 14242
lxc 20231210145651.872 ERROR    af_unix - ../src/lxc/af_unix.c:lxc_abstract_unix_recv_fds_iov:218 - Connection reset by peer - Failed to receive response
lxc 20231210145651.873 ERROR    commands - ../src/lxc/commands.c:lxc_cmd_rsp_recv_fds:128 - Failed to receive file descriptors for command "get_state"

~/trellis-cli/test.dev$ lxc config set test-dev security.privileged true

Now the container starts, but another issue is I didn't get an IPV4 for it, this is maybe due to my host config or something. If it is the case, I will try later on another machine.

~/trellis-cli/test.dev$ ../trellis-cli vm start
Running command => lxc start test-dev
Error starting VM.
timeout hydrating VM: Could not determine IP address for VM: inet address family not found

~/trellis-cli/test.dev$ lxc info --show-log local:test-dev
Name: test-dev
Status: RUNNING
Type: container
Architecture: x86_64
PID: 14367
Created: 2023/12/10 15:56 CET
Last Used: 2023/12/10 16:04 CET

Resources:
  Processes: 8
  CPU usage:
    CPU usage (in seconds): 4
  Memory usage:
    Memory (current): 23.07MiB
  Network usage:
    eth0:
      Type: broadcast
      State: UP
      Host interface: veth62325d50
      MAC address: 00:16:3e:e2:d4:f0
      MTU: 1500
      Bytes received: 460B
      Bytes sent: 2.58kB
      Packets received: 2
      Packets sent: 16
      IP addresses:
        inet6: fe80::216:3eff:fee2:d4f0/64 (link)
    lo:
      Type: loopback
      State: UP
      MTU: 65536
      Bytes received: 0B
      Bytes sent: 0B
      Packets received: 0
      Packets sent: 0
      IP addresses:
        inet:  127.0.0.1/8 (local)
        inet6: ::1/128 (local)

Log:

~/trellis-cli/test.dev$ lxc ls test-dev
+----------+---------+------+----------------------------------------------+-----------+-----------+
|   NAME   |  STATE  | IPV4 |                     IPV6                     |   TYPE    | SNAPSHOTS |
+----------+---------+------+----------------------------------------------+-----------+-----------+
| test-dev | RUNNING |      | fd42:33f:ef23:7e19:216:3eff:fee2:d4f0 (eth0) | CONTAINER | 0         |
+----------+---------+------+----------------------------------------------+-----------+-----------+

~/trellis-cli/test.dev$ lxc network list
+--------+----------+---------+-----------------+--------------------------+-------------+---------+---------+
|  NAME  |   TYPE   | MANAGED |      IPV4       |           IPV6           | DESCRIPTION | USED BY |  STATE  |
+--------+----------+---------+-----------------+--------------------------+-------------+---------+---------+
| eth0   | physical | NO      |                 |                          |             | 0       |         |
+--------+----------+---------+-----------------+--------------------------+-------------+---------+---------+
| lxdbr0 | bridge   | YES     | 10.204.254.1/24 | fd42:33f:ef23:7e19::1/64 |             | 2       | CREATED |
+--------+----------+---------+-----------------+--------------------------+-------------+---------+---------+

Also, what are the next steps in bringing this feature towards release ?

I am not a go developer but I can help you test and debug

swalkinshaw commented 9 months ago

Thanks for testing this out @alex-galey. My memory isn't fresh on this, but I definitely ran into the uid mapping and maybe the networking one too.

Basically it started to get a bit complicated and I had enough things to improve for Lima at the time so I just left this. And then I haven't had as much time to devote to Trellis.

Does the IP listed from lxc network list (10.204.254.1) actually work? (eg for SSH)?

alex-galey commented 9 months ago

My pleasure, I also fiddled around with lxd a couple of years back and was able to use lxc for local development with Trellis at that time. My reference thread was this one.

I cannot connect to the container with this IP (it is the bridge IP btw, not the container IP which only has a ipv6) :

user@work-wp2:~$ ssh user@10.204.254.1 -vvv
OpenSSH_9.2p1 Debian-2+deb12u1, OpenSSL 3.0.11 19 Sep 2023
debug1: Reading configuration data /etc/ssh/ssh_config
debug1: /etc/ssh/ssh_config line 19: include /etc/ssh/ssh_config.d/*.conf matched no files
debug1: /etc/ssh/ssh_config line 21: Applying options for *
debug2: resolve_canonicalize: hostname 10.204.254.1 is address
debug3: expanded UserKnownHostsFile '~/.ssh/known_hosts' -> '/home/user/.ssh/known_hosts'
debug3: expanded UserKnownHostsFile '~/.ssh/known_hosts2' -> '/home/user/.ssh/known_hosts2'
debug3: ssh_connect_direct: entering
debug1: Connecting to 10.204.254.1 [10.204.254.1] port 22.
debug3: set_sock_tos: set socket 3 IP_TOS 0x10
debug1: connect to address 10.204.254.1 port 22: Connection refused
ssh: connect to host 10.204.254.1 port 22: Connection refused

The user has been created and the public ssh key has been copied properly here : @work-wp2 : host @test-dev : container

user@work-wp2:~$ lxc shell test-dev

root@test-dev:~# su user

user@test-dev:/root$ cat ~/.ssh/authorized_keys 
ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABgQC7zmBl1PKhXhvLdZp3X2geBAIAMP7Ti4uGl5toX6FAxB4G6AC222bMomU3EwJVZiCgJNV712lAI1ASGD6EhJ4HXWht4YLBa91wD8yLBVvQjCc/Wcz3fcaZntvOlwjUwBConIPxD4MWgEbdWdZ11zV4Z2wUkPHrFHJUgJR40EeuI0+fCRW8bWd0EtAsRRtfNi8BaG8AjR4zDlHRYO11+73Spqgoa/ZPmjSFtFLpflpP9jStAGZyEikRd3lcI2obYPLmapQ2B7DtJiBfcuADmDUXsYvSGk59DK4TqZF/VljQs6Zpjn99DOH7ZZkFtl7Kr6n9E/j2cY8bUylaINhtmUhImiBm7hk3zaKrb9pkRJ4MFuwA5hjrHTtEFz/B6hrFeZ71RaUYxfA7G5anOMkkTq1ZL4DKdtISzeRJMgbXXm8E10zFiS+aVeYWC+rBlXPVQCx3iiW9Kf9pkgvJq/Vuok0ookJBAsdLJ5g1e01wt0b7VtdDaQjoXVK2Qv3CYtg1EnU= user@work-wp2

user@work-wp2:~$ cat ~/.ssh/id_rsa.pub 
ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABgQC7zmBl1PKhXhvLdZp3X2geBAIAMP7Ti4uGl5toX6FAxB4G6AC222bMomU3EwJVZiCgJNV712lAI1ASGD6EhJ4HXWht4YLBa91wD8yLBVvQjCc/Wcz3fcaZntvOlwjUwBConIPxD4MWgEbdWdZ11zV4Z2wUkPHrFHJUgJR40EeuI0+fCRW8bWd0EtAsRRtfNi8BaG8AjR4zDlHRYO11+73Spqgoa/ZPmjSFtFLpflpP9jStAGZyEikRd3lcI2obYPLmapQ2B7DtJiBfcuADmDUXsYvSGk59DK4TqZF/VljQs6Zpjn99DOH7ZZkFtl7Kr6n9E/j2cY8bUylaINhtmUhImiBm7hk3zaKrb9pkRJ4MFuwA5hjrHTtEFz/B6hrFeZ71RaUYxfA7G5anOMkkTq1ZL4DKdtISzeRJMgbXXm8E10zFiS+aVeYWC+rBlXPVQCx3iiW9Kf9pkgvJq/Vuok0ookJBAsdLJ5g1e01wt0b7VtdDaQjoXVK2Qv3CYtg1EnU= user@work-wp2

I am running this in a Xen environment and this could cause the network issue, I will try this elsewhere in some time.