lxc / lxcfs

FUSE filesystem for LXC
https://linuxcontainers.org/lxcfs
Other
1.05k stars 250 forks source link

lxcfs crash on lxd 5.9 rev 24164 #573

Closed zrav closed 1 year ago

zrav commented 1 year ago

Due to https://discuss.linuxcontainers.org/t/number-of-cpus-reported-by-proc-stat-fluctuates-causing-issues/15780 we are running LXD 5.9 revision 24164. After running a few days lxcfs crashed:

Dec 21 05:33:41 kernel: show_signal_msg: 14 callbacks suppressed
Dec 21 05:33:41 kernel: lxcfs[3219179]: segfault at 0 ip 00007f8084afdf81 sp 00007f8084a2e780 error 6 in libc-2.31.so[7f8084a94000+178000]
Dec 21 05:33:41 kernel: Code: 00 00 4c 89 ef 4c 89 4c 24 08 e8 3a 68 00 00 48 89 e9 4c 89 e2 48 89 ee 48 8d 05 2a d2 15 00 4c 89 ef 48 89 84 24 e8 00 00 00 <c6> 45 00 00 e8 06 7e 00 00 89 d9 4c 89 fa 4c 89 f6 4c 89 ef e8 c6

This is an Ubuntu 22.04.1 running kernel 5.15.0-56-generic on an AMD Epyc 7702P (128 thread) system with 512GB RAM.

lxc info
config:
  core.https_address: '[::]:8443'
  core.trust_password: true
  images.auto_update_interval: "0"
api_extensions:
- storage_zfs_remove_snapshots
- container_host_shutdown_timeout
- container_stop_priority
- container_syscall_filtering
- auth_pki
- container_last_used_at
- etag
- patch
- usb_devices
- https_allowed_credentials
- image_compression_algorithm
- directory_manipulation
- container_cpu_time
- storage_zfs_use_refquota
- storage_lvm_mount_options
- network
- profile_usedby
- container_push
- container_exec_recording
- certificate_update
- container_exec_signal_handling
- gpu_devices
- container_image_properties
- migration_progress
- id_map
- network_firewall_filtering
- network_routes
- storage
- file_delete
- file_append
- network_dhcp_expiry
- storage_lvm_vg_rename
- storage_lvm_thinpool_rename
- network_vlan
- image_create_aliases
- container_stateless_copy
- container_only_migration
- storage_zfs_clone_copy
- unix_device_rename
- storage_lvm_use_thinpool
- storage_rsync_bwlimit
- network_vxlan_interface
- storage_btrfs_mount_options
- entity_description
- image_force_refresh
- storage_lvm_lv_resizing
- id_map_base
- file_symlinks
- container_push_target
- network_vlan_physical
- storage_images_delete
- container_edit_metadata
- container_snapshot_stateful_migration
- storage_driver_ceph
- storage_ceph_user_name
- resource_limits
- storage_volatile_initial_source
- storage_ceph_force_osd_reuse
- storage_block_filesystem_btrfs
- resources
- kernel_limits
- storage_api_volume_rename
- macaroon_authentication
- network_sriov
- console
- restrict_devlxd
- migration_pre_copy
- infiniband
- maas_network
- devlxd_events
- proxy
- network_dhcp_gateway
- file_get_symlink
- network_leases
- unix_device_hotplug
- storage_api_local_volume_handling
- operation_description
- clustering
- event_lifecycle
- storage_api_remote_volume_handling
- nvidia_runtime
- container_mount_propagation
- container_backup
- devlxd_images
- container_local_cross_pool_handling
- proxy_unix
- proxy_udp
- clustering_join
- proxy_tcp_udp_multi_port_handling
- network_state
- proxy_unix_dac_properties
- container_protection_delete
- unix_priv_drop
- pprof_http
- proxy_haproxy_protocol
- network_hwaddr
- proxy_nat
- network_nat_order
- container_full
- candid_authentication
- backup_compression
- candid_config
- nvidia_runtime_config
- storage_api_volume_snapshots
- storage_unmapped
- projects
- candid_config_key
- network_vxlan_ttl
- container_incremental_copy
- usb_optional_vendorid
- snapshot_scheduling
- snapshot_schedule_aliases
- container_copy_project
- clustering_server_address
- clustering_image_replication
- container_protection_shift
- snapshot_expiry
- container_backup_override_pool
- snapshot_expiry_creation
- network_leases_location
- resources_cpu_socket
- resources_gpu
- resources_numa
- kernel_features
- id_map_current
- event_location
- storage_api_remote_volume_snapshots
- network_nat_address
- container_nic_routes
- rbac
- cluster_internal_copy
- seccomp_notify
- lxc_features
- container_nic_ipvlan
- network_vlan_sriov
- storage_cephfs
- container_nic_ipfilter
- resources_v2
- container_exec_user_group_cwd
- container_syscall_intercept
- container_disk_shift
- storage_shifted
- resources_infiniband
- daemon_storage
- instances
- image_types
- resources_disk_sata
- clustering_roles
- images_expiry
- resources_network_firmware
- backup_compression_algorithm
- ceph_data_pool_name
- container_syscall_intercept_mount
- compression_squashfs
- container_raw_mount
- container_nic_routed
- container_syscall_intercept_mount_fuse
- container_disk_ceph
- virtual-machines
- image_profiles
- clustering_architecture
- resources_disk_id
- storage_lvm_stripes
- vm_boot_priority
- unix_hotplug_devices
- api_filtering
- instance_nic_network
- clustering_sizing
- firewall_driver
- projects_limits
- container_syscall_intercept_hugetlbfs
- limits_hugepages
- container_nic_routed_gateway
- projects_restrictions
- custom_volume_snapshot_expiry
- volume_snapshot_scheduling
- trust_ca_certificates
- snapshot_disk_usage
- clustering_edit_roles
- container_nic_routed_host_address
- container_nic_ipvlan_gateway
- resources_usb_pci
- resources_cpu_threads_numa
- resources_cpu_core_die
- api_os
- container_nic_routed_host_table
- container_nic_ipvlan_host_table
- container_nic_ipvlan_mode
- resources_system
- images_push_relay
- network_dns_search
- container_nic_routed_limits
- instance_nic_bridged_vlan
- network_state_bond_bridge
- usedby_consistency
- custom_block_volumes
- clustering_failure_domains
- resources_gpu_mdev
- console_vga_type
- projects_limits_disk
- network_type_macvlan
- network_type_sriov
- container_syscall_intercept_bpf_devices
- network_type_ovn
- projects_networks
- projects_networks_restricted_uplinks
- custom_volume_backup
- backup_override_name
- storage_rsync_compression
- network_type_physical
- network_ovn_external_subnets
- network_ovn_nat
- network_ovn_external_routes_remove
- tpm_device_type
- storage_zfs_clone_copy_rebase
- gpu_mdev
- resources_pci_iommu
- resources_network_usb
- resources_disk_address
- network_physical_ovn_ingress_mode
- network_ovn_dhcp
- network_physical_routes_anycast
- projects_limits_instances
- network_state_vlan
- instance_nic_bridged_port_isolation
- instance_bulk_state_change
- network_gvrp
- instance_pool_move
- gpu_sriov
- pci_device_type
- storage_volume_state
- network_acl
- migration_stateful
- disk_state_quota
- storage_ceph_features
- projects_compression
- projects_images_remote_cache_expiry
- certificate_project
- network_ovn_acl
- projects_images_auto_update
- projects_restricted_cluster_target
- images_default_architecture
- network_ovn_acl_defaults
- gpu_mig
- project_usage
- network_bridge_acl
- warnings
- projects_restricted_backups_and_snapshots
- clustering_join_token
- clustering_description
- server_trusted_proxy
- clustering_update_cert
- storage_api_project
- server_instance_driver_operational
- server_supported_storage_drivers
- event_lifecycle_requestor_address
- resources_gpu_usb
- clustering_evacuation
- network_ovn_nat_address
- network_bgp
- network_forward
- custom_volume_refresh
- network_counters_errors_dropped
- metrics
- image_source_project
- clustering_config
- network_peer
- linux_sysctl
- network_dns
- ovn_nic_acceleration
- certificate_self_renewal
- instance_project_move
- storage_volume_project_move
- cloud_init
- network_dns_nat
- database_leader
- instance_all_projects
- clustering_groups
- ceph_rbd_du
- instance_get_full
- qemu_metrics
- gpu_mig_uuid
- event_project
- clustering_evacuation_live
- instance_allow_inconsistent_copy
- network_state_ovn
- storage_volume_api_filtering
- image_restrictions
- storage_zfs_export
- network_dns_records
- storage_zfs_reserve_space
- network_acl_log
- storage_zfs_blocksize
- metrics_cpu_seconds
- instance_snapshot_never
- certificate_token
- instance_nic_routed_neighbor_probe
- event_hub
- agent_nic_config
- projects_restricted_intercept
- metrics_authentication
- images_target_project
- cluster_migration_inconsistent_copy
- cluster_ovn_chassis
- container_syscall_intercept_sched_setscheduler
- storage_lvm_thinpool_metadata_size
- storage_volume_state_total
- instance_file_head
- instances_nic_host_name
- image_copy_profile
- container_syscall_intercept_sysinfo
- clustering_evacuation_mode
- resources_pci_vpd
- qemu_raw_conf
- storage_cephfs_fscache
- network_load_balancer
- vsock_api
- instance_ready_state
- network_bgp_holdtime
- storage_volumes_all_projects
- metrics_memory_oom_total
- storage_buckets
- storage_buckets_create_credentials
- metrics_cpu_effective_total
- projects_networks_restricted_access
- storage_buckets_local
- loki
- acme
- internal_metrics
- cluster_join_token_expiry
- remote_token_expiry
- init_preseed
- storage_volumes_created_at
- cpu_hotplug
- projects_networks_zones
api_status: stable
api_version: "1.0"
auth: trusted
public: false
auth_methods:
- tls
environment:
  addresses:
  - ...:8443
  architectures:
  - x86_64
  - i686
  certificate: ...
  certificate_fingerprint: ...
  driver: qemu | lxc
  driver_version: 7.1.0 | 5.0.1
  firewall: nftables
  kernel: Linux
  kernel_architecture: x86_64
  kernel_features:
    idmapped_mounts: "true"
    netnsid_getifaddrs: "true"
    seccomp_listener: "true"
    seccomp_listener_continue: "true"
    shiftfs: "false"
    uevent_injection: "true"
    unpriv_fscaps: "true"
  kernel_version: 5.15.0-56-generic
  lxc_features:
    cgroup2: "true"
    core_scheduling: "true"
    devpts_fd: "true"
    idmapped_mounts_v2: "true"
    mount_injection_file: "true"
    network_gateway_device_route: "true"
    network_ipvlan: "true"
    network_l2proxy: "true"
    network_phys_macvlan_mtu: "true"
    network_veth_router: "true"
    pidfd: "true"
    seccomp_allow_deny_syntax: "true"
    seccomp_notify: "true"
    seccomp_proxy_send_notify_fd: "true"
  os_name: Ubuntu
  os_version: "22.04"
  project: default
  server: lxd
  server_clustered: false
  server_event_mode: full-mesh
  server_name: server.domain.com
  server_pid: 1657650
  server_version: "5.9"
  storage: zfs
  storage_version: 2.1.4-0ubuntu0.1
  storage_supported_drivers:
  - name: zfs
    version: 2.1.4-0ubuntu0.1
    remote: false
  - name: btrfs
    version: 5.4.1
    remote: false
  - name: ceph
    version: 15.2.17
    remote: true
  - name: cephfs
    version: 15.2.17
    remote: true
  - name: cephobject
    version: 15.2.17
    remote: true
  - name: dir
    version: "1"
    remote: false
  - name: lvm
    version: 2.03.07(2) (2019-11-30) / 1.02.167 (2019-11-30) / 4.45.0
    remote: false

As requested, further information:

service apport status
● apport.service - LSB: automatic crash report generation
     Loaded: loaded (/etc/init.d/apport; generated)
     Active: active (exited) since Sat 2022-12-17 08:45:41 CET; 4 days ago
       Docs: man:systemd-sysv-generator(8)
        CPU: 27ms

Dec 17 08:45:40 server.domain.com systemd[1]: Starting LSB: automatic crash report generation...
Dec 17 08:45:41 server.domain.com apport[3908]:  * Starting automatic crash report generation: apport
Dec 17 08:45:41 server.domain.com apport[3908]:    ...done.
Dec 17 08:45:41 server.domain.com systemd[1]: Started LSB: automatic crash report generation.
cat /proc/sys/kernel/core_pattern
|/usr/share/apport/apport -p%p -s%s -c%c -d%d -P%P -u%u -g%g -- %E
ls -la /var/crash
total 8
drwxrwsrwt  2 root whoopsie 4096 Nov 15 06:25 .
drwxr-xr-x 15 root root     4096 Nov 13 20:56 ..
ls -la /var/lib/apport/coredump/
total 8
drwxr-xr-x 2 root root 4096 Oct 27  2021 .
drwxr-xr-x 3 root root 4096 Oct 27  2021 ..
cat /var/log/apport.log
ERROR: apport (pid 3551728) Wed Dec 21 05:33:41 2022: host pid 5646 crashed in a separate mount namespace, ignoring

Unfortunately no dumps are available and the lxd log shows nothing of interest during the time of crash:

journalctl -u snap.lxd.daemon
Dec 17 08:49:16 server.domain.com lxd.daemon[5419]: => LXD is ready
Dec 17 09:09:32 server.domain.com lxd.daemon[5660]: time="2022-12-17T09:09:32+01:00" level=warning msg="Detected poll(POLLNVAL) event: exiting"
Dec 21 12:22:29 server.domain.com systemd[1]: Stopping Service for snap application lxd.daemon...
Dec 21 12:22:29 server.domain.com lxd.daemon[1626376]: => Stop reason is: host shutdown

Please tell me if I should modify any configuration to catch the next possible crash.

mihalicyn commented 1 year ago

Ugh, it looks like apport not handling crashes of processes from non-host mount namespaces.

I can suggest you tweak core_pattern temporarily until the next crash of lxcfs. Like this:

echo '|/bin/sh -c $@ -- eval exec cat > /var/crash/core-%e.%p' > /proc/sys/kernel/core_pattern

(reason for using piping here is that kernel will ignore the RLIMIT_CORE value)

Then you can restore the original value of /proc/sys/kernel/core_pattern after we collect the coredump.

mihalicyn commented 1 year ago

@zrav do you have any updates?

zrav commented 1 year ago

lxcfs hasn't crashed again so far. As soon as it happens I'll post the info.

mihalicyn commented 1 year ago

From the disassembly analysis of Glibc libc-2.31.so it follows that this crash was here:

000000000008be50 <vscanf@@GLIBC_2.2.5>:
   8be50:       f3 0f 1e fa             endbr64 
   8be54:       48 8b 05 65 01 16 00    mov    0x160165(%rip),%rax        # 1ebfc0 <stdin@@GLIBC_2.2.5-0x
17d0>

...

   8bf59:       4c 89 ef                mov    %r13,%rdi
   8bf5c:       4c 89 4c 24 08          mov    %r9,0x8(%rsp)
   8bf61:       e8 3a 68 00 00          call   927a0 <_IO_enable_locks@@GLIBC_PRIVATE+0xb0>
   8bf66:       48 89 e9                mov    %rbp,%rcx
   8bf69:       4c 89 e2                mov    %r12,%rdx
   8bf6c:       48 89 ee                mov    %rbp,%rsi
   8bf6f:       48 8d 05 2a d2 15 00    lea    0x15d22a(%rip),%rax        # 1e91a0 <_IO_wfile_jumps@@GLIBC_2.2.5+0x240>
   8bf76:       4c 89 ef                mov    %r13,%rdi
   8bf79:       48 89 84 24 e8 00 00    mov    %rax,0xe8(%rsp)
   8bf80:       00 

   8bf81:       c6 45 00 00             movb   $0x0,0x0(%rbp) <=== CRASH

   8bf85:       e8 06 7e 00 00          call   93d90 <_IO_str_pbackfail@@GLIBC_2.2.5+0x60>
   8bf8a:       89 d9                   mov    %ebx,%ecx
   8bf8c:       4c 89 fa                mov    %r15,%rdx
   8bf8f:       4c 89 f6                mov    %r14,%rsi
   8bf92:       4c 89 ef                mov    %r13,%rdi
   8bf95:       e8 c6 a8 fe ff          call   76860 <psiginfo@@GLIBC_2.10+0x13400>
   8bf9a:       4c 8b 4c 24 08          mov    0x8(%rsp),%r9
   8bf9f:       4c 39 4c 24 48          cmp    %r9,0x48(%rsp)
   8bfa4:       74 08                   je     8bfae <vscanf@@GLIBC_2.2.5+0x15e>
   8bfa6:       48 8b 54 24 38          mov    0x38(%rsp),%rdx
   8bfab:       c6 02 00                movb   $0x0,(%rdx)
   8bfae:       48 8b 9c 24 48 01 00    mov    0x148(%rsp),%rbx
   8bfb5:       00 
   8bfb6:       64 48 33 1c 25 28 00    xor    %fs:0x28,%rbx
..

Another crash https://discuss.linuxcontainers.org/t/lxd-5-9-crashes-on-centos-7/16092

mihalicyn commented 1 year ago

@zrav do you have any updated regarding this, or we can close the issue until next reproducer with more debug information?

We have integrated libsegfault (https://github.com/lxc/lxd-pkg-snap/pull/114) to the LXD snap package, so it should help us to find crash reason next time.

zrav commented 1 year ago

@mihalicyn I'll close this issue. If/when I have more info I'll post it.