lxc / incus

Powerful system container and virtual machine manager
https://linuxcontainers.org/incus
Apache License 2.0
2.49k stars 199 forks source link

Starting any container fails: newuidmap: write to uid_map failed: Invalid argument #458

Closed cjwatson closed 7 months ago

cjwatson commented 7 months ago

Required information

config:
  core.https_address: '[::]'
  images.auto_update_interval: "24"
  images.remote_cache_expiry: "60"
api_extensions:
- storage_zfs_remove_snapshots
- container_host_shutdown_timeout
- container_stop_priority
- container_syscall_filtering
- auth_pki
- container_last_used_at
- etag
- patch
- usb_devices
- https_allowed_credentials
- image_compression_algorithm
- directory_manipulation
- container_cpu_time
- storage_zfs_use_refquota
- storage_lvm_mount_options
- network
- profile_usedby
- container_push
- container_exec_recording
- certificate_update
- container_exec_signal_handling
- gpu_devices
- container_image_properties
- migration_progress
- id_map
- network_firewall_filtering
- network_routes
- storage
- file_delete
- file_append
- network_dhcp_expiry
- storage_lvm_vg_rename
- storage_lvm_thinpool_rename
- network_vlan
- image_create_aliases
- container_stateless_copy
- container_only_migration
- storage_zfs_clone_copy
- unix_device_rename
- storage_lvm_use_thinpool
- storage_rsync_bwlimit
- network_vxlan_interface
- storage_btrfs_mount_options
- entity_description
- image_force_refresh
- storage_lvm_lv_resizing
- id_map_base
- file_symlinks
- container_push_target
- network_vlan_physical
- storage_images_delete
- container_edit_metadata
- container_snapshot_stateful_migration
- storage_driver_ceph
- storage_ceph_user_name
- resource_limits
- storage_volatile_initial_source
- storage_ceph_force_osd_reuse
- storage_block_filesystem_btrfs
- resources
- kernel_limits
- storage_api_volume_rename
- network_sriov
- console
- restrict_dev_incus
- migration_pre_copy
- infiniband
- dev_incus_events
- proxy
- network_dhcp_gateway
- file_get_symlink
- network_leases
- unix_device_hotplug
- storage_api_local_volume_handling
- operation_description
- clustering
- event_lifecycle
- storage_api_remote_volume_handling
- nvidia_runtime
- container_mount_propagation
- container_backup
- dev_incus_images
- container_local_cross_pool_handling
- proxy_unix
- proxy_udp
- clustering_join
- proxy_tcp_udp_multi_port_handling
- network_state
- proxy_unix_dac_properties
- container_protection_delete
- unix_priv_drop
- pprof_http
- proxy_haproxy_protocol
- network_hwaddr
- proxy_nat
- network_nat_order
- container_full
- backup_compression
- nvidia_runtime_config
- storage_api_volume_snapshots
- storage_unmapped
- projects
- network_vxlan_ttl
- container_incremental_copy
- usb_optional_vendorid
- snapshot_scheduling
- snapshot_schedule_aliases
- container_copy_project
- clustering_server_address
- clustering_image_replication
- container_protection_shift
- snapshot_expiry
- container_backup_override_pool
- snapshot_expiry_creation
- network_leases_location
- resources_cpu_socket
- resources_gpu
- resources_numa
- kernel_features
- id_map_current
- event_location
- storage_api_remote_volume_snapshots
- network_nat_address
- container_nic_routes
- cluster_internal_copy
- seccomp_notify
- lxc_features
- container_nic_ipvlan
- network_vlan_sriov
- storage_cephfs
- container_nic_ipfilter
- resources_v2
- container_exec_user_group_cwd
- container_syscall_intercept
- container_disk_shift
- storage_shifted
- resources_infiniband
- daemon_storage
- instances
- image_types
- resources_disk_sata
- clustering_roles
- images_expiry
- resources_network_firmware
- backup_compression_algorithm
- ceph_data_pool_name
- container_syscall_intercept_mount
- compression_squashfs
- container_raw_mount
- container_nic_routed
- container_syscall_intercept_mount_fuse
- container_disk_ceph
- virtual-machines
- image_profiles
- clustering_architecture
- resources_disk_id
- storage_lvm_stripes
- vm_boot_priority
- unix_hotplug_devices
- api_filtering
- instance_nic_network
- clustering_sizing
- firewall_driver
- projects_limits
- container_syscall_intercept_hugetlbfs
- limits_hugepages
- container_nic_routed_gateway
- projects_restrictions
- custom_volume_snapshot_expiry
- volume_snapshot_scheduling
- trust_ca_certificates
- snapshot_disk_usage
- clustering_edit_roles
- container_nic_routed_host_address
- container_nic_ipvlan_gateway
- resources_usb_pci
- resources_cpu_threads_numa
- resources_cpu_core_die
- api_os
- container_nic_routed_host_table
- container_nic_ipvlan_host_table
- container_nic_ipvlan_mode
- resources_system
- images_push_relay
- network_dns_search
- container_nic_routed_limits
- instance_nic_bridged_vlan
- network_state_bond_bridge
- usedby_consistency
- custom_block_volumes
- clustering_failure_domains
- resources_gpu_mdev
- console_vga_type
- projects_limits_disk
- network_type_macvlan
- network_type_sriov
- container_syscall_intercept_bpf_devices
- network_type_ovn
- projects_networks
- projects_networks_restricted_uplinks
- custom_volume_backup
- backup_override_name
- storage_rsync_compression
- network_type_physical
- network_ovn_external_subnets
- network_ovn_nat
- network_ovn_external_routes_remove
- tpm_device_type
- storage_zfs_clone_copy_rebase
- gpu_mdev
- resources_pci_iommu
- resources_network_usb
- resources_disk_address
- network_physical_ovn_ingress_mode
- network_ovn_dhcp
- network_physical_routes_anycast
- projects_limits_instances
- network_state_vlan
- instance_nic_bridged_port_isolation
- instance_bulk_state_change
- network_gvrp
- instance_pool_move
- gpu_sriov
- pci_device_type
- storage_volume_state
- network_acl
- migration_stateful
- disk_state_quota
- storage_ceph_features
- projects_compression
- projects_images_remote_cache_expiry
- certificate_project
- network_ovn_acl
- projects_images_auto_update
- projects_restricted_cluster_target
- images_default_architecture
- network_ovn_acl_defaults
- gpu_mig
- project_usage
- network_bridge_acl
- warnings
- projects_restricted_backups_and_snapshots
- clustering_join_token
- clustering_description
- server_trusted_proxy
- clustering_update_cert
- storage_api_project
- server_instance_driver_operational
- server_supported_storage_drivers
- event_lifecycle_requestor_address
- resources_gpu_usb
- clustering_evacuation
- network_ovn_nat_address
- network_bgp
- network_forward
- custom_volume_refresh
- network_counters_errors_dropped
- metrics
- image_source_project
- clustering_config
- network_peer
- linux_sysctl
- network_dns
- ovn_nic_acceleration
- certificate_self_renewal
- instance_project_move
- storage_volume_project_move
- cloud_init
- network_dns_nat
- database_leader
- instance_all_projects
- clustering_groups
- ceph_rbd_du
- instance_get_full
- qemu_metrics
- gpu_mig_uuid
- event_project
- clustering_evacuation_live
- instance_allow_inconsistent_copy
- network_state_ovn
- storage_volume_api_filtering
- image_restrictions
- storage_zfs_export
- network_dns_records
- storage_zfs_reserve_space
- network_acl_log
- storage_zfs_blocksize
- metrics_cpu_seconds
- instance_snapshot_never
- certificate_token
- instance_nic_routed_neighbor_probe
- event_hub
- agent_nic_config
- projects_restricted_intercept
- metrics_authentication
- images_target_project
- cluster_migration_inconsistent_copy
- cluster_ovn_chassis
- container_syscall_intercept_sched_setscheduler
- storage_lvm_thinpool_metadata_size
- storage_volume_state_total
- instance_file_head
- instances_nic_host_name
- image_copy_profile
- container_syscall_intercept_sysinfo
- clustering_evacuation_mode
- resources_pci_vpd
- qemu_raw_conf
- storage_cephfs_fscache
- network_load_balancer
- vsock_api
- instance_ready_state
- network_bgp_holdtime
- storage_volumes_all_projects
- metrics_memory_oom_total
- storage_buckets
- storage_buckets_create_credentials
- metrics_cpu_effective_total
- projects_networks_restricted_access
- storage_buckets_local
- loki
- acme
- internal_metrics
- cluster_join_token_expiry
- remote_token_expiry
- init_preseed
- storage_volumes_created_at
- cpu_hotplug
- projects_networks_zones
- network_txqueuelen
- cluster_member_state
- instances_placement_scriptlet
- storage_pool_source_wipe
- zfs_block_mode
- instance_generation_id
- disk_io_cache
- amd_sev
- storage_pool_loop_resize
- migration_vm_live
- ovn_nic_nesting
- oidc
- network_ovn_l3only
- ovn_nic_acceleration_vdpa
- cluster_healing
- instances_state_total
- auth_user
- security_csm
- instances_rebuild
- numa_cpu_placement
- custom_volume_iso
- network_allocations
- zfs_delegate
- storage_api_remote_volume_snapshot_copy
- operations_get_query_all_projects
- metadata_configuration
- syslog_socket
- event_lifecycle_name_and_project
- instances_nic_limits_priority
- disk_initial_volume_configuration
- operation_wait
- image_restriction_privileged
- cluster_internal_custom_volume_copy
- disk_io_bus
- storage_cephfs_create_missing
- instance_move_config
- ovn_ssl_config
- certificate_description
- disk_io_bus_virtio_blk
- loki_config_instance
- instance_create_start
- clustering_evacuation_stop_options
- boot_host_shutdown_action
- agent_config_drive
api_status: stable
api_version: "1.0"
auth: trusted
public: false
auth_methods:
- tls
auth_user_name: cjwatson
auth_user_method: unix
environment:
  addresses:
  - 172.20.153.147:8443
  - '[2001:8b0:664:0:3602:86ff:fea8:81d]:8443'
  - 10.10.25.1:8443
  - 10.0.3.1:8443
  - 172.17.0.1:8443
  - 10.36.63.1:8443
  architectures:
  - x86_64
  - i686
  certificate: |
    -----BEGIN CERTIFICATE-----
    MIIGjDCCBHSgAwIBAgIQLf1ycWKsIfHRUbru8rWZujANBgkqhkiG9w0BAQsFADA2
    MRwwGgYDVQQKExNsaW51eGNvbnRhaW5lcnMub3JnMRYwFAYDVQQDDA1yb290QG5p
    ZWp3ZWluMB4XDTE2MDMyMjE1MTMwOVoXDTI2MDMyMDE1MTMwOVowNjEcMBoGA1UE
    ChMTbGludXhjb250YWluZXJzLm9yZzEWMBQGA1UEAwwNcm9vdEBuaWVqd2VpbjCC
    AiIwDQYJKoZIhvcNAQEBBQADggIPADCCAgoCggIBAJ6TXPWcmkUAh2lg+tHbLgqw
    J47kyIUX760E7BrpRqPyrIP9wtAqjdpazcX83GwbukgKkfFr/FRNv0/iP5rkbqq3
    ss92+Z2eOuqvQictrIaFcknwPjFC7P4RDE/UmRhMrdMd1jWNSFo1rT7HUHPMe2q9
    W3vdT8znj7U3blXuGtPgD8y8eNznJjdnjtwgx4F/70z5N2F4zD4OixrSLp7cluLx
    NdlLDdN5uMBxp9byY1QtrjkKHfdL8qBOifeQS544QGZgUGLfa5W5/DQvOQmji+NC
    f5UU2j7hbcOYA8S4CopM5jFwpX3X2oy/2tt2/JlAAKKYQtmFh3u7MAC2ndhN4TvO
    ukzYU9l+xvjSukeUc6f9m3TOpcn6zw9pR0iwFKlQsfQgUt7tcHZYfcKoq0Tczl2D
    /pa6vJNLaQ6i/8uYWcCyXuqZKvCl/WCoYVuu4xZc2VBXUPGwDDw6ukJlslVqoUTO
    gUVpNvSjAPAh8o0Vdks5UR60NMEVnMKryeBNJJ4qi8du3iCa3h2I2EWb6nCcj33t
    XW4Llrz3U5jl+UZncWPXitpAORxbP8VjnJWJPruT0P5a4jvafG0cfUkTFTolUJ6y
    31mohIYOrfLG5NDIU4YRxSxXRq4REbS3rjyhIox4ugFVcryZj8StvbU5TfhKQZWi
    aBISFxCtz21EQJI4Gy45AgMBAAGjggGUMIIBkDAOBgNVHQ8BAf8EBAMCBaAwEwYD
    VR0lBAwwCgYIKwYBBQUHAwEwDAYDVR0TAQH/BAIwADCCAVkGA1UdEQSCAVAwggFM
    gghuaWVqd2VpboIQMTcyLjIwLjE1My4xOS8yNYIpMjAwMTo4YjA6YmZmMjplYjE0
    OjQxZTI6NTQwZTo4MGYwOjFiZjIvNjSCKTIwMDE6OGIwOmJmZjI6ZWIxNDoyYzky
    OmY5MTc6OTJmZTpmYjc1LzY0gikyMDAxOjhiMDpiZmYyOmViMTQ6MTk3MjpmMWY2
    OjcxNTk6MmIxZC82NIIpMjAwMTo4YjA6YmZmMjplYjE0OjZhZjc6MjhmZjpmZWNj
    OmUzNDYvNjSCHGZlODA6OjZhZjc6MjhmZjpmZWNjOmUzNDYvNjSCCzEwLjAuMy4x
    LzI0ghxmZTgwOjpkMDg1OjIxZmY6ZmU5NTozMWI3LzY0ghtmZTgwOjpmYzE1OjRm
    ZjpmZThkOmE3NjkvNjSCHGZlODA6OmZjZTI6MjFmZjpmZTFlOjgwYzUvNjQwDQYJ
    KoZIhvcNAQELBQADggIBAJPdZjgcIIBT3UhuGjLWFda5/o9MSdEB2cl2ISo185D1
    tb7DGl3M7fUsu/N9VfMkz9QtP5R/sCYly3hyZLgKj5dz9c43BXwOMUdYaB+KShVB
    k7FE8s+V1VI2WCwXTtzHs5MgREe9TGMRg7BBzkat5m6gCIXhjO0jf2hdyuR/A4Z/
    RbAqh7jcDDHUZbdS/xBgE0eUfKsyAsDbru7JIBAfbrmUounwwLHzGycWpaxVBxqP
    3e3Zw6ousN9ELqvFs8nxz5UxUpmG3ynpwaZd3HULowrb+Fujjn+O+Ozwj7Uthgo7
    Hm+G8rVFPXxgK3mDkEAGfChPSga5QCfCOiyR7p3X4kLhZ2ONXFTHAWHIwvzMvmQm
    8nS233VygRb2+RFnzoFoIX9VWzGUtVzLm3kyNAw8esgGk7SKDGbhhGi6uQ5zK5q9
    7/zECXl6TFRKvm5CnIQW3maAA72mdLgfJBYsXecBpGqNtwKBHNvZ4BxQYoMHKu/i
    9CGuRyUNrAlACbWXFCcrl2dqZ/XfOXwXK9ln8xAWjYj1eQNks93YuBa7BDm3v2XH
    bYcD3BGs/ftUw2HMkWmwJG4BY3HKmT6QcayUGEWFT8oOA+BvNKakb0UYED4CKzh2
    DRGHayVJ5fsGw8Q5zni4YJTaGnhu7Clo5g3KhiNRL+FZX/r/u49WI5Or9xOPnOxL
    -----END CERTIFICATE-----
  certificate_fingerprint: 2a2f687296fb3e74ae352eb303843725690d16fd2b57fd373f44d46fbb8721d6
  driver: lxc | qemu
  driver_version: 5.0.3 | 8.2.1
  firewall: nftables
  kernel: Linux
  kernel_architecture: x86_64
  kernel_features:
    idmapped_mounts: "true"
    netnsid_getifaddrs: "true"
    seccomp_listener: "true"
    seccomp_listener_continue: "true"
    uevent_injection: "true"
    unpriv_fscaps: "true"
  kernel_version: 6.5.0-14-generic
  lxc_features:
    cgroup2: "true"
    core_scheduling: "true"
    devpts_fd: "true"
    idmapped_mounts_v2: "true"
    mount_injection_file: "true"
    network_gateway_device_route: "true"
    network_ipvlan: "true"
    network_l2proxy: "true"
    network_phys_macvlan_mtu: "true"
    network_veth_router: "true"
    pidfd: "true"
    seccomp_allow_deny_syntax: "true"
    seccomp_notify: "true"
    seccomp_proxy_send_notify_fd: "true"
  os_name: Ubuntu
  os_version: "23.10"
  project: default
  server: incus
  server_clustered: false
  server_event_mode: full-mesh
  server_name: niejwein
  server_pid: 1914017
  server_version: 0.5.1
  storage: zfs
  storage_version: 2.2.0-0ubuntu1~23.10
  storage_supported_drivers:
  - name: dir
    version: "1"
    remote: false
  - name: lvm
    version: 2.03.16(2) (2022-05-18) / 1.02.185 (2022-05-18) / 4.48.0
    remote: false
  - name: zfs
    version: 2.2.0-0ubuntu1~23.10
    remote: false
  - name: btrfs
    version: 6.3.2
    remote: false

Issue description

I switched from LXD to Incus a while ago, and things were working fine with 0.4.0. As far as I know I haven't changed anything other than applying routine upgrades, and am now on Incus 0.5.1. I tried to start a container this afternoon and it failed with newuidmap failed to write mapping "newuidmap: write to uid_map failed: Invalid argument"; I don't seem to be able to get any container to start, even a new one with no interesting configuration.

As far as I know I'm not doing anything custom with ID mapping, at least not in my default profile.

Steps to reproduce

$ incus launch images:debian/sid/amd64 test
Launching test
Error: Failed instance creation: Failed to run: /opt/incus/bin/incusd forkstart test /var/lib/incus/containers /run/incus/test/lxc.conf: exit status 1
$ incus info --show-log test
Name: test
Status: STOPPED
Type: container
Architecture: x86_64
Created: 2024/01/31 13:12 GMT
Last Used: 2024/01/31 13:12 GMT

Log:

lxc test 20240131131214.183 ERROR    conf - ../src/lxc/conf.c:lxc_map_ids:3701 - newuidmap failed to write mapping "newuidmap: write to uid_map failed: Invalid argument": newuidmap 2044050 0 100000 65536 0 165536 65536 0 1000000 1000000000
lxc test 20240131131214.183 ERROR    start - ../src/lxc/start.c:lxc_spawn:1788 - Failed to set up id mapping.
lxc test 20240131131214.183 ERROR    lxccontainer - ../src/lxc/lxccontainer.c:wait_on_daemonized_start:878 - Received container state "ABORTING" instead of "RUNNING"
lxc test 20240131131214.186 ERROR    start - ../src/lxc/start.c:__lxc_start:2107 - Failed to spawn container "test"
lxc test 20240131131214.186 WARN     start - ../src/lxc/start.c:lxc_abort:1036 - No such process - Failed to send SIGKILL via pidfd 43 for process 2044050
lxc 20240131131214.278 ERROR    af_unix - ../src/lxc/af_unix.c:lxc_abstract_unix_recv_fds_iov:218 - Connection reset by peer - Failed to receive response
lxc 20240131131214.278 ERROR    commands - ../src/lxc/commands.c:lxc_cmd_rsp_recv_fds:128 - Failed to receive file descriptors for command "get_init_pid"
$ incus config show test --expanded
architecture: x86_64
config:
  image.architecture: amd64
  image.description: Debian sid amd64 (20240130_05:24)
  image.os: Debian
  image.release: sid
  image.serial: "20240130_05:24"
  image.type: squashfs
  image.variant: default
  user.user-data: |
    #cloud-config
    runcmd:
      - "systemctl cat apt-daily.timer >/dev/null 2>&1 && systemctl mask apt-daily.timer"
      - "systemctl cat cron.service >/dev/null 2>&1 && systemctl mask cron.service"
      #- "echo 'Acquire::http::Proxy \"http://wwwcache.pelham.vpn.ucam.org:3142/\";' >/etc/apt/apt.conf.d/95-juju-proxy-settings"
  volatile.base_image: 27ce57296b1578c233225bd6d3862e591b6218e91a1c0d9b69bf46b5d93f5594
  volatile.cloud-init.instance-id: b2ee3103-e60b-4974-b811-2af5e52f8ea6
  volatile.eth0.host_name: veth700db734
  volatile.eth0.hwaddr: 00:16:3e:85:a2:3d
  volatile.idmap.base: "0"
  volatile.idmap.current: '[{"Isuid":true,"Isgid":false,"Hostid":100000,"Nsid":0,"Maprange":65536},{"Isuid":true,"Isgid":false,"Hostid":165536,"Nsid":0,"Maprange":65536},{"Isuid":true,"Isgid":false,"Hostid":1000000,"Nsid":0,"Maprange":1000000000},{"Isuid":false,"Isgid":true,"Hostid":100000,"Nsid":0,"Maprange":65536},{"Isuid":false,"Isgid":true,"Hostid":165536,"Nsid":0,"Maprange":65536},{"Isuid":false,"Isgid":true,"Hostid":1000000,"Nsid":0,"Maprange":1000000000}]'
  volatile.idmap.next: '[{"Isuid":true,"Isgid":false,"Hostid":100000,"Nsid":0,"Maprange":65536},{"Isuid":true,"Isgid":false,"Hostid":165536,"Nsid":0,"Maprange":65536},{"Isuid":true,"Isgid":false,"Hostid":1000000,"Nsid":0,"Maprange":1000000000},{"Isuid":false,"Isgid":true,"Hostid":100000,"Nsid":0,"Maprange":65536},{"Isuid":false,"Isgid":true,"Hostid":165536,"Nsid":0,"Maprange":65536},{"Isuid":false,"Isgid":true,"Hostid":1000000,"Nsid":0,"Maprange":1000000000}]'
  volatile.last_state.idmap: '[]'
  volatile.last_state.power: STOPPED
  volatile.last_state.ready: "false"
  volatile.uuid: 6b79b1c6-efb1-46ac-abf5-9f6ffb952fa7
  volatile.uuid.generation: 6b79b1c6-efb1-46ac-abf5-9f6ffb952fa7
devices:
  eth0:
    name: eth0
    nictype: bridged
    parent: lxdbr0
    type: nic
  root:
    path: /
    pool: lxd
    type: disk
ephemeral: false
profiles:
- default
stateful: false
description: ""

Information to attach

dmesg shows nothing at all in the relevant time period.

The main daemon log just says time="2024-01-31T13:12:14Z" level=error msg="Failed starting instance" action=start created="2024-01-31 13:12:13.26877479 +0000 UTC" ephemeral=false instance=test instanceType=container project=default stateful=false used="1970-01-01 00:00:00 +0000 UTC".

incus --debug start test and incus monitor --pretty seem to just reproduce the information above in different forms, but I can provide them if needed.

cjwatson commented 7 months ago

FWIW it's not a recent change in Debian unstable, since incus launch images:debian/bookworm/amd64 test behaves the same way.

stgraber commented 7 months ago

You must likely have conflicting/overlapping entries for the root user in /etc/subuid and/or /etc/subgid

stgraber commented 7 months ago

We fixed our idmap parsing package with Incus 0.5 as older versions were incorrectly only picking the first entry for every user, causing issues for those needing to split the allocation due to having reserved chunks of uids and gids for things like remote authentication.

The downside of this fix is that if your root user has multiple ranges that are at least 65k id large, then they'll all be picked up, if those overlap, then the kernel will say no.

cjwatson commented 7 months ago

Here's my /etc/subuid (/etc/subgid is similar):

root:1000:1
root:100000:65536
statd:110000:10000
postgres:120000:10000
[local users redacted]
lxd:165536:65536
root:165536:65536
# empty default subuid/subgid file
[another local user redacted]
root:1000000:1000000000

I deleted all but the last of those entries for root, restarted incus.service, and it works now, but that was quite weird. According to my etckeeper logs, all but the last of those entries date back to before June 2017 (when I installed etckeeper on this machine); the last one was created automatically when I switched to Incus.

Could this be retroactively added to https://discuss.linuxcontainers.org/t/incus-0-5-has-been-released/18814, maybe? I looked there before filing a bug, and I'm guessing I'm not the only one with bits inherited from several years ago ...

stgraber commented 7 months ago

Yeah, I'll add a mention to the announcement as we've indeed had maybe 3-4 people reach out so far with similar issues

stgraber commented 7 months ago

Added now.

Jean-Daniel commented 7 months ago

Thanks you, I just go the same issue here.

marneu commented 5 months ago

same issue here (bookwork 0.6-1~bpo12+1)