lxc / incus

Powerful system container and virtual machine manager
https://linuxcontainers.org/incus
Apache License 2.0
2.43k stars 192 forks source link

btrfs doesn't report transfer progress on both sides of the migration #676

Closed phol closed 5 months ago

phol commented 5 months ago

Required information

Incus info of local machine

config:
  acme.agree_tos: "true"
  core.https_address: :8443
api_extensions:
- storage_zfs_remove_snapshots
- container_host_shutdown_timeout
- container_stop_priority
- container_syscall_filtering
- auth_pki
- container_last_used_at
- etag
- patch
- usb_devices
- https_allowed_credentials
- image_compression_algorithm
- directory_manipulation
- container_cpu_time
- storage_zfs_use_refquota
- storage_lvm_mount_options
- network
- profile_usedby
- container_push
- container_exec_recording
- certificate_update
- container_exec_signal_handling
- gpu_devices
- container_image_properties
- migration_progress
- id_map
- network_firewall_filtering
- network_routes
- storage
- file_delete
- file_append
- network_dhcp_expiry
- storage_lvm_vg_rename
- storage_lvm_thinpool_rename
- network_vlan
- image_create_aliases
- container_stateless_copy
- container_only_migration
- storage_zfs_clone_copy
- unix_device_rename
- storage_lvm_use_thinpool
- storage_rsync_bwlimit
- network_vxlan_interface
- storage_btrfs_mount_options
- entity_description
- image_force_refresh
- storage_lvm_lv_resizing
- id_map_base
- file_symlinks
- container_push_target
- network_vlan_physical
- storage_images_delete
- container_edit_metadata
- container_snapshot_stateful_migration
- storage_driver_ceph
- storage_ceph_user_name
- resource_limits
- storage_volatile_initial_source
- storage_ceph_force_osd_reuse
- storage_block_filesystem_btrfs
- resources
- kernel_limits
- storage_api_volume_rename
- network_sriov
- console
- restrict_dev_incus
- migration_pre_copy
- infiniband
- dev_incus_events
- proxy
- network_dhcp_gateway
- file_get_symlink
- network_leases
- unix_device_hotplug
- storage_api_local_volume_handling
- operation_description
- clustering
- event_lifecycle
- storage_api_remote_volume_handling
- nvidia_runtime
- container_mount_propagation
- container_backup
- dev_incus_images
- container_local_cross_pool_handling
- proxy_unix
- proxy_udp
- clustering_join
- proxy_tcp_udp_multi_port_handling
- network_state
- proxy_unix_dac_properties
- container_protection_delete
- unix_priv_drop
- pprof_http
- proxy_haproxy_protocol
- network_hwaddr
- proxy_nat
- network_nat_order
- container_full
- backup_compression
- nvidia_runtime_config
- storage_api_volume_snapshots
- storage_unmapped
- projects
- network_vxlan_ttl
- container_incremental_copy
- usb_optional_vendorid
- snapshot_scheduling
- snapshot_schedule_aliases
- container_copy_project
- clustering_server_address
- clustering_image_replication
- container_protection_shift
- snapshot_expiry
- container_backup_override_pool
- snapshot_expiry_creation
- network_leases_location
- resources_cpu_socket
- resources_gpu
- resources_numa
- kernel_features
- id_map_current
- event_location
- storage_api_remote_volume_snapshots
- network_nat_address
- container_nic_routes
- cluster_internal_copy
- seccomp_notify
- lxc_features
- container_nic_ipvlan
- network_vlan_sriov
- storage_cephfs
- container_nic_ipfilter
- resources_v2
- container_exec_user_group_cwd
- container_syscall_intercept
- container_disk_shift
- storage_shifted
- resources_infiniband
- daemon_storage
- instances
- image_types
- resources_disk_sata
- clustering_roles
- images_expiry
- resources_network_firmware
- backup_compression_algorithm
- ceph_data_pool_name
- container_syscall_intercept_mount
- compression_squashfs
- container_raw_mount
- container_nic_routed
- container_syscall_intercept_mount_fuse
- container_disk_ceph
- virtual-machines
- image_profiles
- clustering_architecture
- resources_disk_id
- storage_lvm_stripes
- vm_boot_priority
- unix_hotplug_devices
- api_filtering
- instance_nic_network
- clustering_sizing
- firewall_driver
- projects_limits
- container_syscall_intercept_hugetlbfs
- limits_hugepages
- container_nic_routed_gateway
- projects_restrictions
- custom_volume_snapshot_expiry
- volume_snapshot_scheduling
- trust_ca_certificates
- snapshot_disk_usage
- clustering_edit_roles
- container_nic_routed_host_address
- container_nic_ipvlan_gateway
- resources_usb_pci
- resources_cpu_threads_numa
- resources_cpu_core_die
- api_os
- container_nic_routed_host_table
- container_nic_ipvlan_host_table
- container_nic_ipvlan_mode
- resources_system
- images_push_relay
- network_dns_search
- container_nic_routed_limits
- instance_nic_bridged_vlan
- network_state_bond_bridge
- usedby_consistency
- custom_block_volumes
- clustering_failure_domains
- resources_gpu_mdev
- console_vga_type
- projects_limits_disk
- network_type_macvlan
- network_type_sriov
- container_syscall_intercept_bpf_devices
- network_type_ovn
- projects_networks
- projects_networks_restricted_uplinks
- custom_volume_backup
- backup_override_name
- storage_rsync_compression
- network_type_physical
- network_ovn_external_subnets
- network_ovn_nat
- network_ovn_external_routes_remove
- tpm_device_type
- storage_zfs_clone_copy_rebase
- gpu_mdev
- resources_pci_iommu
- resources_network_usb
- resources_disk_address
- network_physical_ovn_ingress_mode
- network_ovn_dhcp
- network_physical_routes_anycast
- projects_limits_instances
- network_state_vlan
- instance_nic_bridged_port_isolation
- instance_bulk_state_change
- network_gvrp
- instance_pool_move
- gpu_sriov
- pci_device_type
- storage_volume_state
- network_acl
- migration_stateful
- disk_state_quota
- storage_ceph_features
- projects_compression
- projects_images_remote_cache_expiry
- certificate_project
- network_ovn_acl
- projects_images_auto_update
- projects_restricted_cluster_target
- images_default_architecture
- network_ovn_acl_defaults
- gpu_mig
- project_usage
- network_bridge_acl
- warnings
- projects_restricted_backups_and_snapshots
- clustering_join_token
- clustering_description
- server_trusted_proxy
- clustering_update_cert
- storage_api_project
- server_instance_driver_operational
- server_supported_storage_drivers
- event_lifecycle_requestor_address
- resources_gpu_usb
- clustering_evacuation
- network_ovn_nat_address
- network_bgp
- network_forward
- custom_volume_refresh
- network_counters_errors_dropped
- metrics
- image_source_project
- clustering_config
- network_peer
- linux_sysctl
- network_dns
- ovn_nic_acceleration
- certificate_self_renewal
- instance_project_move
- storage_volume_project_move
- cloud_init
- network_dns_nat
- database_leader
- instance_all_projects
- clustering_groups
- ceph_rbd_du
- instance_get_full
- qemu_metrics
- gpu_mig_uuid
- event_project
- clustering_evacuation_live
- instance_allow_inconsistent_copy
- network_state_ovn
- storage_volume_api_filtering
- image_restrictions
- storage_zfs_export
- network_dns_records
- storage_zfs_reserve_space
- network_acl_log
- storage_zfs_blocksize
- metrics_cpu_seconds
- instance_snapshot_never
- certificate_token
- instance_nic_routed_neighbor_probe
- event_hub
- agent_nic_config
- projects_restricted_intercept
- metrics_authentication
- images_target_project
- images_all_projects
- cluster_migration_inconsistent_copy
- cluster_ovn_chassis
- container_syscall_intercept_sched_setscheduler
- storage_lvm_thinpool_metadata_size
- storage_volume_state_total
- instance_file_head
- instances_nic_host_name
- image_copy_profile
- container_syscall_intercept_sysinfo
- clustering_evacuation_mode
- resources_pci_vpd
- qemu_raw_conf
- storage_cephfs_fscache
- network_load_balancer
- vsock_api
- instance_ready_state
- network_bgp_holdtime
- storage_volumes_all_projects
- metrics_memory_oom_total
- storage_buckets
- storage_buckets_create_credentials
- metrics_cpu_effective_total
- projects_networks_restricted_access
- storage_buckets_local
- loki
- acme
- internal_metrics
- cluster_join_token_expiry
- remote_token_expiry
- init_preseed
- storage_volumes_created_at
- cpu_hotplug
- projects_networks_zones
- network_txqueuelen
- cluster_member_state
- instances_placement_scriptlet
- storage_pool_source_wipe
- zfs_block_mode
- instance_generation_id
- disk_io_cache
- amd_sev
- storage_pool_loop_resize
- migration_vm_live
- ovn_nic_nesting
- oidc
- network_ovn_l3only
- ovn_nic_acceleration_vdpa
- cluster_healing
- instances_state_total
- auth_user
- security_csm
- instances_rebuild
- numa_cpu_placement
- custom_volume_iso
- network_allocations
- zfs_delegate
- storage_api_remote_volume_snapshot_copy
- operations_get_query_all_projects
- metadata_configuration
- syslog_socket
- event_lifecycle_name_and_project
- instances_nic_limits_priority
- disk_initial_volume_configuration
- operation_wait
- image_restriction_privileged
- cluster_internal_custom_volume_copy
- disk_io_bus
- storage_cephfs_create_missing
- instance_move_config
- ovn_ssl_config
- certificate_description
- disk_io_bus_virtio_blk
- loki_config_instance
- instance_create_start
- clustering_evacuation_stop_options
- boot_host_shutdown_action
- agent_config_drive
- network_state_ovn_lr
- image_template_permissions
- storage_bucket_backup
- storage_lvm_cluster
- shared_custom_block_volumes
- auth_tls_jwt
- oidc_claim
- device_usb_serial
- numa_cpu_balanced
- image_restriction_nesting
- network_integrations
api_status: stable
api_version: "1.0"
auth: trusted
public: false
auth_methods:
- tls
auth_user_name: beheer
auth_user_method: unix
environment:
  addresses:
  - 10.0.0.136:8443
  - '[redacted]:8443'
  - 10.0.0.91:8443
  - '[redacted]:8443'
  - 10.10.88.1:8443
  - '[fdcc:b486:4ec1::1]:8443'
  - 10.10.99.1:8443
  - '[fd9a:57b8:36a5::1]:8443'
  architectures:
  - aarch64
  - armv6l
  - armv7l
  - armv8l
  certificate: |
    -----BEGIN CERTIFICATE-----
    REMOVED
    -----END CERTIFICATE-----
  certificate_fingerprint: REMOVED
  driver: lxc
  driver_version: 5.0.3
  firewall: nftables
  kernel: Linux
  kernel_architecture: aarch64
  kernel_features:
    idmapped_mounts: "true"
    netnsid_getifaddrs: "true"
    seccomp_listener: "true"
    seccomp_listener_continue: "true"
    uevent_injection: "true"
    unpriv_binfmt: "false"
    unpriv_fscaps: "true"
  kernel_version: 6.1.0-18-arm64
  lxc_features:
    cgroup2: "true"
    core_scheduling: "true"
    devpts_fd: "true"
    idmapped_mounts_v2: "true"
    mount_injection_file: "true"
    network_gateway_device_route: "true"
    network_ipvlan: "true"
    network_l2proxy: "true"
    network_phys_macvlan_mtu: "true"
    network_veth_router: "true"
    pidfd: "true"
    seccomp_allow_deny_syntax: "true"
    seccomp_notify: "true"
    seccomp_proxy_send_notify_fd: "true"
  os_name: Debian GNU/Linux
  os_version: "12"
  project: default
  server: incus
  server_clustered: false
  server_event_mode: full-mesh
  server_name: hermes
  server_pid: 3790551
  server_version: "0.7"
  storage: btrfs
  storage_version: "6.2"
  storage_supported_drivers:
  - name: btrfs
    version: "6.2"
    remote: false
  - name: dir
    version: "1"
    remote: false

Incus info of remote

config:
  core.https_address: '[::]:8443'
api_extensions:
- storage_zfs_remove_snapshots
- container_host_shutdown_timeout
- container_stop_priority
- container_syscall_filtering
- auth_pki
- container_last_used_at
- etag
- patch
- usb_devices
- https_allowed_credentials
- image_compression_algorithm
- directory_manipulation
- container_cpu_time
- storage_zfs_use_refquota
- storage_lvm_mount_options
- network
- profile_usedby
- container_push
- container_exec_recording
- certificate_update
- container_exec_signal_handling
- gpu_devices
- container_image_properties
- migration_progress
- id_map
- network_firewall_filtering
- network_routes
- storage
- file_delete
- file_append
- network_dhcp_expiry
- storage_lvm_vg_rename
- storage_lvm_thinpool_rename
- network_vlan
- image_create_aliases
- container_stateless_copy
- container_only_migration
- storage_zfs_clone_copy
- unix_device_rename
- storage_lvm_use_thinpool
- storage_rsync_bwlimit
- network_vxlan_interface
- storage_btrfs_mount_options
- entity_description
- image_force_refresh
- storage_lvm_lv_resizing
- id_map_base
- file_symlinks
- container_push_target
- network_vlan_physical
- storage_images_delete
- container_edit_metadata
- container_snapshot_stateful_migration
- storage_driver_ceph
- storage_ceph_user_name
- resource_limits
- storage_volatile_initial_source
- storage_ceph_force_osd_reuse
- storage_block_filesystem_btrfs
- resources
- kernel_limits
- storage_api_volume_rename
- network_sriov
- console
- restrict_dev_incus
- migration_pre_copy
- infiniband
- dev_incus_events
- proxy
- network_dhcp_gateway
- file_get_symlink
- network_leases
- unix_device_hotplug
- storage_api_local_volume_handling
- operation_description
- clustering
- event_lifecycle
- storage_api_remote_volume_handling
- nvidia_runtime
- container_mount_propagation
- container_backup
- dev_incus_images
- container_local_cross_pool_handling
- proxy_unix
- proxy_udp
- clustering_join
- proxy_tcp_udp_multi_port_handling
- network_state
- proxy_unix_dac_properties
- container_protection_delete
- unix_priv_drop
- pprof_http
- proxy_haproxy_protocol
- network_hwaddr
- proxy_nat
- network_nat_order
- container_full
- backup_compression
- nvidia_runtime_config
- storage_api_volume_snapshots
- storage_unmapped
- projects
- network_vxlan_ttl
- container_incremental_copy
- usb_optional_vendorid
- snapshot_scheduling
- snapshot_schedule_aliases
- container_copy_project
- clustering_server_address
- clustering_image_replication
- container_protection_shift
- snapshot_expiry
- container_backup_override_pool
- snapshot_expiry_creation
- network_leases_location
- resources_cpu_socket
- resources_gpu
- resources_numa
- kernel_features
- id_map_current
- event_location
- storage_api_remote_volume_snapshots
- network_nat_address
- container_nic_routes
- cluster_internal_copy
- seccomp_notify
- lxc_features
- container_nic_ipvlan
- network_vlan_sriov
- storage_cephfs
- container_nic_ipfilter
- resources_v2
- container_exec_user_group_cwd
- container_syscall_intercept
- container_disk_shift
- storage_shifted
- resources_infiniband
- daemon_storage
- instances
- image_types
- resources_disk_sata
- clustering_roles
- images_expiry
- resources_network_firmware
- backup_compression_algorithm
- ceph_data_pool_name
- container_syscall_intercept_mount
- compression_squashfs
- container_raw_mount
- container_nic_routed
- container_syscall_intercept_mount_fuse
- container_disk_ceph
- virtual-machines
- image_profiles
- clustering_architecture
- resources_disk_id
- storage_lvm_stripes
- vm_boot_priority
- unix_hotplug_devices
- api_filtering
- instance_nic_network
- clustering_sizing
- firewall_driver
- projects_limits
- container_syscall_intercept_hugetlbfs
- limits_hugepages
- container_nic_routed_gateway
- projects_restrictions
- custom_volume_snapshot_expiry
- volume_snapshot_scheduling
- trust_ca_certificates
- snapshot_disk_usage
- clustering_edit_roles
- container_nic_routed_host_address
- container_nic_ipvlan_gateway
- resources_usb_pci
- resources_cpu_threads_numa
- resources_cpu_core_die
- api_os
- container_nic_routed_host_table
- container_nic_ipvlan_host_table
- container_nic_ipvlan_mode
- resources_system
- images_push_relay
- network_dns_search
- container_nic_routed_limits
- instance_nic_bridged_vlan
- network_state_bond_bridge
- usedby_consistency
- custom_block_volumes
- clustering_failure_domains
- resources_gpu_mdev
- console_vga_type
- projects_limits_disk
- network_type_macvlan
- network_type_sriov
- container_syscall_intercept_bpf_devices
- network_type_ovn
- projects_networks
- projects_networks_restricted_uplinks
- custom_volume_backup
- backup_override_name
- storage_rsync_compression
- network_type_physical
- network_ovn_external_subnets
- network_ovn_nat
- network_ovn_external_routes_remove
- tpm_device_type
- storage_zfs_clone_copy_rebase
- gpu_mdev
- resources_pci_iommu
- resources_network_usb
- resources_disk_address
- network_physical_ovn_ingress_mode
- network_ovn_dhcp
- network_physical_routes_anycast
- projects_limits_instances
- network_state_vlan
- instance_nic_bridged_port_isolation
- instance_bulk_state_change
- network_gvrp
- instance_pool_move
- gpu_sriov
- pci_device_type
- storage_volume_state
- network_acl
- migration_stateful
- disk_state_quota
- storage_ceph_features
- projects_compression
- projects_images_remote_cache_expiry
- certificate_project
- network_ovn_acl
- projects_images_auto_update
- projects_restricted_cluster_target
- images_default_architecture
- network_ovn_acl_defaults
- gpu_mig
- project_usage
- network_bridge_acl
- warnings
- projects_restricted_backups_and_snapshots
- clustering_join_token
- clustering_description
- server_trusted_proxy
- clustering_update_cert
- storage_api_project
- server_instance_driver_operational
- server_supported_storage_drivers
- event_lifecycle_requestor_address
- resources_gpu_usb
- clustering_evacuation
- network_ovn_nat_address
- network_bgp
- network_forward
- custom_volume_refresh
- network_counters_errors_dropped
- metrics
- image_source_project
- clustering_config
- network_peer
- linux_sysctl
- network_dns
- ovn_nic_acceleration
- certificate_self_renewal
- instance_project_move
- storage_volume_project_move
- cloud_init
- network_dns_nat
- database_leader
- instance_all_projects
- clustering_groups
- ceph_rbd_du
- instance_get_full
- qemu_metrics
- gpu_mig_uuid
- event_project
- clustering_evacuation_live
- instance_allow_inconsistent_copy
- network_state_ovn
- storage_volume_api_filtering
- image_restrictions
- storage_zfs_export
- network_dns_records
- storage_zfs_reserve_space
- network_acl_log
- storage_zfs_blocksize
- metrics_cpu_seconds
- instance_snapshot_never
- certificate_token
- instance_nic_routed_neighbor_probe
- event_hub
- agent_nic_config
- projects_restricted_intercept
- metrics_authentication
- images_target_project
- images_all_projects
- cluster_migration_inconsistent_copy
- cluster_ovn_chassis
- container_syscall_intercept_sched_setscheduler
- storage_lvm_thinpool_metadata_size
- storage_volume_state_total
- instance_file_head
- instances_nic_host_name
- image_copy_profile
- container_syscall_intercept_sysinfo
- clustering_evacuation_mode
- resources_pci_vpd
- qemu_raw_conf
- storage_cephfs_fscache
- network_load_balancer
- vsock_api
- instance_ready_state
- network_bgp_holdtime
- storage_volumes_all_projects
- metrics_memory_oom_total
- storage_buckets
- storage_buckets_create_credentials
- metrics_cpu_effective_total
- projects_networks_restricted_access
- storage_buckets_local
- loki
- acme
- internal_metrics
- cluster_join_token_expiry
- remote_token_expiry
- init_preseed
- storage_volumes_created_at
- cpu_hotplug
- projects_networks_zones
- network_txqueuelen
- cluster_member_state
- instances_placement_scriptlet
- storage_pool_source_wipe
- zfs_block_mode
- instance_generation_id
- disk_io_cache
- amd_sev
- storage_pool_loop_resize
- migration_vm_live
- ovn_nic_nesting
- oidc
- network_ovn_l3only
- ovn_nic_acceleration_vdpa
- cluster_healing
- instances_state_total
- auth_user
- security_csm
- instances_rebuild
- numa_cpu_placement
- custom_volume_iso
- network_allocations
- zfs_delegate
- storage_api_remote_volume_snapshot_copy
- operations_get_query_all_projects
- metadata_configuration
- syslog_socket
- event_lifecycle_name_and_project
- instances_nic_limits_priority
- disk_initial_volume_configuration
- operation_wait
- image_restriction_privileged
- cluster_internal_custom_volume_copy
- disk_io_bus
- storage_cephfs_create_missing
- instance_move_config
- ovn_ssl_config
- certificate_description
- disk_io_bus_virtio_blk
- loki_config_instance
- instance_create_start
- clustering_evacuation_stop_options
- boot_host_shutdown_action
- agent_config_drive
- network_state_ovn_lr
- image_template_permissions
- storage_bucket_backup
- storage_lvm_cluster
- shared_custom_block_volumes
- auth_tls_jwt
- oidc_claim
- device_usb_serial
- numa_cpu_balanced
- image_restriction_nesting
- network_integrations
api_status: stable
api_version: "1.0"
auth: trusted
public: false
auth_methods:
- tls
auth_user_name: beheer
auth_user_method: unix
environment:
  addresses:
  - 10.0.0.45:8443
  - '[REDACTED]:8443'
  - '[REDACTED]:8443'
  - 10.10.66.1:8443
  - '[fddb:5833:c3ee::1]:8443'
  - 10.195.49.1:8443
  - '[fd42:a6f2:9167:eda0::1]:8443'
  architectures:
  - x86_64
  - i686
  certificate: |
    -----BEGIN CERTIFICATE-----
    REDACTED
    -----END CERTIFICATE-----
  certificate_fingerprint: REDACTED
  driver: lxc | qemu
  driver_version: 5.0.3 | 8.2.2
  firewall: nftables
  kernel: Linux
  kernel_architecture: x86_64
  kernel_features:
    idmapped_mounts: "true"
    netnsid_getifaddrs: "true"
    seccomp_listener: "true"
    seccomp_listener_continue: "true"
    uevent_injection: "true"
    unpriv_binfmt: "false"
    unpriv_fscaps: "true"
  kernel_version: 6.1.0-17-amd64
  lxc_features:
    cgroup2: "true"
    core_scheduling: "true"
    devpts_fd: "true"
    idmapped_mounts_v2: "true"
    mount_injection_file: "true"
    network_gateway_device_route: "true"
    network_ipvlan: "true"
    network_l2proxy: "true"
    network_phys_macvlan_mtu: "true"
    network_veth_router: "true"
    pidfd: "true"
    seccomp_allow_deny_syntax: "true"
    seccomp_notify: "true"
    seccomp_proxy_send_notify_fd: "true"
  os_name: Debian GNU/Linux
  os_version: "12"
  project: default
  server: incus
  server_clustered: false
  server_event_mode: full-mesh
  server_name: j
  server_pid: 3966034
  server_version: "0.7"
  storage: btrfs
  storage_version: "6.2"
  storage_supported_drivers:
  - name: btrfs
    version: "6.2"
    remote: false
  - name: dir
    version: "1"
    remote: false

Issue description

(I'm using alias lxc = incus)

Adding a remote and then executing lxc list or lxc exec from local to the remote works properly.

lxc remote add j 10.0.0.45

lxc list j: 
lxc exec j:summary-mole bash

This also does seem to execute, but at the end returns

lxc copy av j:av
Error: Failed instance creation: Error transferring instance data: Failed migration on target: Error reading migration control source: local error: tls: bad record MAC

Steps to reproduce

  1. Add a remote
  2. Try to copy a container

Information to attach

I think the error might either be related to the differing processor architectures, or are related to the fact that the machines, which are run on Oracle cloud free tier, are doing something strange with the network interface MAC.

On both machines, br0 provides networking for the containers.

Network config local machine

sudo cat  /etc/systemd/network/br0.netdev
[NetDev]
Name=br0
Kind=bridge
sudo cat  /etc/systemd/network/br0.netdev
[Match]
Name=br0

[Network]
Description=Container networking bridge
Address=10.10.88.1/24
Address=fdcc:b486:4ec1::1/64
IPMasquerade=both

ConfigureWithoutCarrier=true
ActivationPolicy=always-up

enp0s6 is the primary network interface with DHCP IP 10.0.0.136

sudo cat  /etc/systemd/network/enp0s6.netdev
[Match]
Name=enp0s6

[Network]
DHCP=yes

Network config remote

sudo cat  /etc/systemd/network/br0.netdev
[NetDev]
Name=br0
SkipForwardingDelay=true
Kind=bridge
sudo cat  /etc/systemd/network/br0.network
Name=br0

[Network]
Description=Container networking bridge
Address=10.10.66.1/24
Address=fddb:5833:c3ee::1/64
IPMasquerade=both

ConfigureWithoutCarrier=true
ActivationPolicy=always-up
sudo cat  /etc/systemd/network/ens3.network

ens3 is the primary network interface with DHCP IP 10.0.0.45

[Match]
Name=ens3

[Network]
DHCP=yes
stgraber commented 5 months ago

Can you try your copy again but with --mode=relay?

stgraber commented 5 months ago

When you run lxc copy av j:av, it will have your local Incus generate a token which is then sent by the CLI to the j host, instructing that host to directly connect to your local system to retrieve the instance.

The fact that the target connects to the source is why you often will have no problem interacting with both servers yourself but a copy may fail due to network issues going the other way.

--mode=relay has the CLI tool itself connect to both source and target and relay the data, so that should usually work fine if your CLI was already able to interact with both servers.

You also have the option of using --mode=push which instead has the source server push directly to the target server.

phol commented 5 months ago

Hi @stgraber and thanks for replying so quickly.

With

lxc copy av j:av --mode=relay

I'm getting

Error: Error transferring instance data: Failed migration on target: Error reading migration control source: websocket: close 1006 (abnormal closure): unexpected EOF

During the copy, I do observe a btrfs receive command in htop. However, after about 1m30s, I get the error. I tried this multiple times. During the copy, I can also observe a av instance on j when running lxc ls.

with

lxc copy av j:av --mode=push 

I'm getting this message:

Transferring instance: av: 1.32GB (18.71MB/s) 

The "transferring instance" is not shown when I specify --mode=relay or nothing at all.

However, it unfortunately also errors out. With all three modes, this happens after around 1m30s.

time lxc copy av j:av --mode=push 
Error: Failed instance migration: Failed migration on source: migration dump failed
(00.011761) Error (criu/namespaces.c:460): Can't dump nested uts namespace for 4738
(00.011764) Error (criu/namespaces.c:721): Can't make utsns id
(00.018309) Error (criu/util.c:642): exited, status=1
(00.021460) Error (criu/util.c:642): exited, status=1
(00.022562) Error (criu/cr-dump.c:2098): Dumping FAILED.
incus copy av j:av --mode=push  0.09s user 0.02s system 0% cpu 1:24.65 total
stgraber commented 5 months ago

Ah, please stop the container first or pass --stateless as live migrations of containers are very unlikely to succeed.

phol commented 5 months ago

Alright, I'm running this now. Will update you when it completes.

time lxc copy av j:av --mode=push --stateless
phol commented 5 months ago

I tried the commands.

lxc copy av j:av --mode=push --stateless

Works.

lxc copy av j:av --mode=relay --stateless
lxc copy av j:av --stateless

Work as well but don't show progress. Do you have any idea why this might happen?

Also, about the --stateless flag: Would it perhaps be possible to force the --stateless flag when copying across containers between hosts with differing processor architectures?

stgraber commented 5 months ago

Hmm, I've been unable to reproduce the migration error issue here:

root@v1:~# incus copy u1 v2:u1
Error: Failed instance creation: Error transferring instance data: Failed migration on target: Error from migration control source: Failed migration on source: migration dump failed
(00.003977) Error (criu/namespaces.c:460): Can't dump nested uts namespace for 3850
(00.003979) Error (criu/namespaces.c:721): Can't make utsns id
(00.007038) Error (criu/util.c:642): exited, status=1
(00.007842) Error (criu/util.c:642): exited, status=1
(00.008333) Error (criu/cr-dump.c:2098): Dumping FAILED.
root@v1:~# incus copy u1 v2:u1 --mode=relay
Error: Error transferring instance data: Failed migration on target: Error from migration control source: Failed migration on source: migration dump failed
(00.003417) Error (criu/namespaces.c:460): Can't dump nested uts namespace for 3850
(00.003418) Error (criu/namespaces.c:721): Can't make utsns id
(00.005060) Error (criu/util.c:642): exited, status=1
(00.005759) Error (criu/util.c:642): exited, status=1
(00.006247) Error (criu/cr-dump.c:2098): Dumping FAILED.
root@v1:~# incus copy u1 v2:u1 --mode=push
Error: Failed instance migration: Failed migration on source: migration dump failed
(00.003274) Error (criu/namespaces.c:460): Can't dump nested uts namespace for 3850
(00.003276) Error (criu/namespaces.c:721): Can't make utsns id
(00.005401) Error (criu/util.c:642): exited, status=1
(00.006276) Error (criu/util.c:642): exited, status=1
(00.006805) Error (criu/cr-dump.c:2098): Dumping FAILED.
root@v1:~# 

Then with stateless:

root@v1:~# incus copy u1 v2:u1 --stateless && incus delete -f v2:u1
root@v1:~# incus copy u1 v2:u1 --stateless --mode=relay && incus delete -f v2:u1
root@v1:~# incus copy u1 v2:u1 --stateless --mode=push && incus delete -f v2:u1

All of those showed transfer progress information as expected.

In my case, this was with Incus 0.7 on source and target server, CLI is Incus 0.7 too and storage on both source and target is basic dir backend.

stgraber commented 5 months ago

Moving the issue back to Incomplete and un-milestone until I can reproduce an issue with either the error handling or transfer progress.

phol commented 5 months ago

I was running both machines with incus 0.7, but with the btrfs storage backend on both.

phol commented 5 months ago

Can I be of any help by e.g. retrying with debugging flags or something like that?

stgraber commented 5 months ago

I'll retry with btrfs see if that makes it behave differently here.

stgraber commented 5 months ago

Errors still propagate correctly but the progress information is indeed missing, so that's a btrfs driver issue then.

phol commented 5 months ago

Alright, great. I'm happy to hear I was able to report a bug which helps to improve the project and I wasn't wasting your time. Also, wow, I don't think I've ever seen a bug to be resolved as quickly as this one!

Thank you for all your efforts and good night :)

hi-ko commented 5 months ago

I don't think it's specific to the btrfs driver only. This affects also the zfs driver: using --mode pull --refresh does not show any progress when running on the target host.

I also get the error mentioned in the issue description on some containers when copying over WAN network thru a tunnel:

Error: Failed instance creation: Error transferring instance data: Failed migration on target: Error reading migration control source: local error: tls: bad record MAC

rerunning the command later may be successfull.

full command used:

incus copy $host:$container $container --mode pull --refresh --project $project --target-project=$project

on source target server runs: incus 6.0.0