google / gvisor

Application Kernel for Containers
https://gvisor.dev
Apache License 2.0
15.86k stars 1.3k forks source link

arm64: kvm_test is blocked on TestSafecopySigbus. #6629

Open avagin opened 3 years ago

avagin commented 3 years ago

Description

https://github.com/google/gvisor/pull/6573#issuecomment-924601729

Steps to reproduce

milantracy commented 3 years ago

is that possible to run the test under arm64 if I don't have a machine with the architecture?

avagin commented 3 years ago

you can try to create a qemu aem64 vm: https://wiki.ubuntu.com/ARM64/QEMU

but I am not sure that kvm tests will work there.

cc: @zhlhahaha

zhlhahaha commented 3 years ago

Yes, just as @avagin comment, you can try to run create arm64 VM on x86 via QEMU, but it does not support KVM. For the arm64 machine, you can try to run gVisor on raspberry pi 4 if you have one.

avagin commented 3 years ago
# bazel-out/aarch64-fastbuild-ST-4c64f0b3d5c7/bin/pkg/sentry/platform/kvm/kvm_test_/kvm_test -test.v -test.run=TestSafecopySigbus
=== RUN   TestSafecopySigbus
I1015 17:53:31.660636    5123 physical_map.go:124] region: virtual [fef367c000,ffff7367c000)
I1015 17:53:31.660820    5123 physical_map.go:176] physicalRegion: virtual [1000,10000) => physical [1000,10000)
I1015 17:53:31.660831    5123 physical_map.go:176] physicalRegion: virtual [10000,295000) => physical [10000,295000)
I1015 17:53:31.660839    5123 physical_map.go:176] physicalRegion: virtual [295000,2a0000) => physical [295000,2a0000)
I1015 17:53:31.660846    5123 physical_map.go:176] physicalRegion: virtual [2a0000,578000) => physical [2a0000,578000)
I1015 17:53:31.660853    5123 physical_map.go:176] physicalRegion: virtual [578000,fef367c000) => physical [578000,fef367c000)
I1015 17:53:31.660861    5123 physical_map.go:176] physicalRegion: virtual [ffff7367c000,ffff75a2d000) => physical [fef367c000,fef5a2d000)
I1015 17:53:31.660868    5123 physical_map.go:176] physicalRegion: virtual [ffff75a2d000,ffff75aad000) => physical [fef5a2d000,fef5aad000)
I1015 17:53:31.660876    5123 physical_map.go:176] physicalRegion: virtual [ffff75aad000,ffff75aae000) => physical [fef5aad000,fef5aae000)
I1015 17:53:31.660883    5123 physical_map.go:176] physicalRegion: virtual [ffff75aae000,ffff95a3d000) => physical [fef5aae000,ff15a3d000)
I1015 17:53:31.660890    5123 physical_map.go:176] physicalRegion: virtual [ffff95a3d000,ffff95a3e000) => physical [ff15a3d000,ff15a3e000)
I1015 17:53:31.660897    5123 physical_map.go:176] physicalRegion: virtual [ffff95a3e000,ffff99a2f000) => physical [ff15a3e000,ff19a2f000)
I1015 17:53:31.660905    5123 physical_map.go:176] physicalRegion: virtual [ffff99a2f000,ffff99a30000) => physical [ff19a2f000,ff19a30000)
I1015 17:53:31.660912    5123 physical_map.go:176] physicalRegion: virtual [ffff99a30000,ffff9a22d000) => physical [ff19a30000,ff1a22d000)
I1015 17:53:31.660919    5123 physical_map.go:176] physicalRegion: virtual [ffff9a22d000,ffff9a22e000) => physical [ff1a22d000,ff1a22e000)
I1015 17:53:31.660926    5123 physical_map.go:176] physicalRegion: virtual [ffff9a22e000,ffff9a32d000) => physical [ff1a22e000,ff1a32d000)
I1015 17:53:31.660934    5123 physical_map.go:176] physicalRegion: virtual [ffff9a32d000,ffff9a38d000) => physical [ff1a32d000,ff1a38d000)
I1015 17:53:31.660941    5123 physical_map.go:176] physicalRegion: virtual [ffff9a38d000,ffff9a38f000) => physical [ff1a38d000,ff1a38f000)
I1015 17:53:31.660948    5123 physical_map.go:176] physicalRegion: virtual [ffff9a38f000,ffff9a390000) => physical [ff1a38f000,ff1a390000)
I1015 17:53:31.660955    5123 physical_map.go:176] physicalRegion: virtual [ffff9a390000,fffffffff000) => physical [ff1a390000,ff7ffff000)
root@gviosr-ci-arm64-01:~# cat /proc/5123/maps  | grep memfd:kvm_test
fe735de000-fef35de000 rw-s 00000000 00:01 5122                           /memfd:kvm_test_5123 (deleted)
root@gviosr-ci-arm64-01:~# strace -fp 5123 2>&1 | head -n 30
strace: Process 5123 attached with 6 threads
[pid  5128] futex(0x5d50d8, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...>
[pid  5127] futex(0x40000f6950, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...>
[pid  5126] futex(0x4000180150, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...>
[pid  5125] futex(0x40000f6550, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...>
[pid  5124] restart_syscall(<... resuming interrupted io_setup ...> <unfinished ...>
[pid  5123] rt_sigtimedwait([CHLD], NULL, {tv_sec=0, tv_nsec=0}, 8) = -1 EAGAIN (Resource temporarily unavailable)
[pid  5123] ioctl(12, KVM_RUN, 0)       = -1 EFAULT (Bad address)
[pid  5123] ioctl(12, _IOC(_IOC_WRITE, 0xae, 0xa0, 0x40), 0x4000008ca0) = 0
[pid  5123] ioctl(12, KVM_RUN, 0)       = -1 EFAULT (Bad address)
[pid  5123] ioctl(12, _IOC(_IOC_WRITE, 0xae, 0xa0, 0x40), 0x4000008ca0) = 0
[pid  5123] ioctl(12, KVM_RUN, 0)       = -1 EFAULT (Bad address)
[pid  5123] ioctl(12, _IOC(_IOC_WRITE, 0xae, 0xa0, 0x40), 0x4000008ca0) = 0
[pid  5123] ioctl(12, KVM_RUN, 0)       = -1 EFAULT (Bad address)
[pid  5123] ioctl(12, _IOC(_IOC_WRITE, 0xae, 0xa0, 0x40), 0x4000008ca0) = 0
[pid  5123] ioctl(12, KVM_RUN, 0)       = -1 EFAULT (Bad address)
[pid  5123] ioctl(12, _IOC(_IOC_WRITE, 0xae, 0xa0, 0x40), 0x4000008ca0) = 0
[pid  5123] ioctl(12, KVM_RUN, 0)       = -1 EFAULT (Bad address)
[pid  5123] ioctl(12, _IOC(_IOC_WRITE, 0xae, 0xa0, 0x40), 0x4000008ca0) = 0
[pid  5123] ioctl(12, KVM_RUN, 0)       = -1 EFAULT (Bad address)
[pid  5123] ioctl(12, _IOC(_IOC_WRITE, 0xae, 0xa0, 0x40), 0x4000008ca0) = 0
[pid  5123] ioctl(12, KVM_RUN, 0)       = -1 EFAULT (Bad address)
[pid  5123] ioctl(12, _IOC(_IOC_WRITE, 0xae, 0xa0, 0x40), 0x4000008ca0) = 0
[pid  5123] ioctl(12, KVM_RUN, 0)       = -1 EFAULT (Bad address)
[pid  5123] ioctl(12, _IOC(_IOC_WRITE, 0xae, 0xa0, 0x40), 0x4000008ca0) = 0
[pid  5123] ioctl(12, KVM_RUN, 0)       = -1 EFAULT (Bad address)
[pid  5123] ioctl(12, _IOC(_IOC_WRITE, 0xae, 0xa0, 0x40), 0x4000008ca0) = 0
[pid  5123] ioctl(12, KVM_RUN, 0)       = -1 EFAULT (Bad address)
[pid  5123] ioctl(12, _IOC(_IOC_WRITE, 0xae, 0xa0, 0x40), 0x4000008ca0) = 0
[pid  5123] ioctl(12, KVM_RUN, 0)       = -1 EFAULT (Bad address)
root@gviosr-ci-arm64-01:~# cat /sys/kernel/debug/tracing/trace_pipe  | head -n 50
        kvm_test-4936    [023] d... 34452.451795: kvm_timer_update_irq: VCPU: 32, IRQ 27, level 0
        kvm_test-4936    [023] .... 34452.451796: kvm_exit: TRAP: HSR_EC: 0x0024 (DABT_LOW), PC: 0x00000000001afca0
        kvm_test-4936    [023] .... 34452.451796: kvm_guest_fault: ipa 0xfee198c000, hsr 0x92000005, hxfar 0xfee198c000, pc 0x000000001afca0
        kvm_test-4936    [023] .... 34452.451797: kvm_get_timer_map: VCPU: 32, dv: 1, dp: 0, ep: -1
        kvm_test-4936    [023] d... 34452.451797: kvm_timer_save_state:    CTL: 0x000000 CVAL:              0x0 arch_timer_ctx_index: 1
        kvm_test-4936    [023] d... 34452.451797: kvm_timer_save_state:    CTL: 0x000000 CVAL:              0x0 arch_timer_ctx_index: 0
        kvm_test-4936    [023] .... 34452.451798: kvm_userspace_exit: reason error (14)
        kvm_test-4936    [023] .... 34452.451798: kvm_get_timer_map: VCPU: 32, dv: 1, dp: 0, ep: -1
        kvm_test-4936    [023] .... 34452.451798: kvm_timer_update_irq: VCPU: 32, IRQ 27, level 0
        kvm_test-4936    [023] .... 34452.451799: kvm_timer_update_irq: VCPU: 32, IRQ 30, level 0
        kvm_test-4936    [023] d... 34452.451799: kvm_timer_restore_state: CTL: 0x000000 CVAL:              0x0 arch_timer_ctx_index: 1
        kvm_test-4936    [023] d... 34452.451799: kvm_timer_restore_state: CTL: 0x000000 CVAL:              0x0 arch_timer_ctx_index: 0
        kvm_test-4936    [023] d... 34452.451799: kvm_arm_setup_debug: vcpu: 00000000f33fa138, flags: 0x00000000
        kvm_test-4936    [023] d... 34452.451800: kvm_arm_set_dreg32: MDCR_EL2: 0x00084e66
        kvm_test-4936    [023] d... 34452.451800: kvm_arm_set_dreg32: MDSCR_EL1: 0x00001000
        kvm_test-4936    [023] d... 34452.451800: kvm_entry: PC: 0x00000000001afca0
        kvm_test-4936    [023] d... 34452.451800: kvm_arm_clear_debug: flags: 0x00000000
        kvm_test-4936    [023] d... 34452.451800: kvm_timer_update_irq: VCPU: 32, IRQ 27, level 0
        kvm_test-4936    [023] .... 34452.451801: kvm_exit: TRAP: HSR_EC: 0x0024 (DABT_LOW), PC: 0x00000000001afca0
        kvm_test-4936    [023] .... 34452.451801: kvm_guest_fault: ipa 0xfee198c000, hsr 0x92000005, hxfar 0xfee198c000, pc 0x000000001afca0
        kvm_test-4936    [023] .... 34452.451802: kvm_get_timer_map: VCPU: 32, dv: 1, dp: 0, ep: -1
        kvm_test-4936    [023] d... 34452.451802: kvm_timer_save_state:    CTL: 0x000000 CVAL:              0x0 arch_timer_ctx_index: 1
        kvm_test-4936    [023] d... 34452.451802: kvm_timer_save_state:    CTL: 0x000000 CVAL:              0x0 arch_timer_ctx_index: 0
        kvm_test-4936    [023] .... 34452.451803: kvm_userspace_exit: reason error (14)
        kvm_test-4936    [023] .... 34452.451803: kvm_get_timer_map: VCPU: 32, dv: 1, dp: 0, ep: -1
        kvm_test-4936    [023] .... 34452.451803: kvm_timer_update_irq: VCPU: 32, IRQ 27, level 0
        kvm_test-4936    [023] .... 34452.451804: kvm_timer_update_irq: VCPU: 32, IRQ 30, level 0
        kvm_test-4936    [023] d... 34452.451804: kvm_timer_restore_state: CTL: 0x000000 CVAL:              0x0 arch_timer_ctx_index: 1
        kvm_test-4936    [023] d... 34452.451804: kvm_timer_restore_state: CTL: 0x000000 CVAL:              0x0 arch_timer_ctx_index: 0
        kvm_test-4936    [023] d... 34452.451804: kvm_arm_setup_debug: vcpu: 00000000f33fa138, flags: 0x00000000
        kvm_test-4936    [023] d... 34452.451804: kvm_arm_set_dreg32: MDCR_EL2: 0x00084e66
        kvm_test-4936    [023] d... 34452.451805: kvm_arm_set_dreg32: MDSCR_EL1: 0x00001000
        kvm_test-4936    [023] d... 34452.451805: kvm_entry: PC: 0x00000000001afca0
        kvm_test-4936    [023] d... 34452.451805: kvm_arm_clear_debug: flags: 0x00000000
        kvm_test-4936    [023] d... 34452.451805: kvm_timer_update_irq: VCPU: 32, IRQ 27, level 0
        kvm_test-4936    [023] .... 34452.451805: kvm_exit: TRAP: HSR_EC: 0x0024 (DABT_LOW), PC: 0x00000000001afca0
        kvm_test-4936    [023] .... 34452.451806: kvm_guest_fault: ipa 0xfee198c000, hsr 0x92000005, hxfar 0xfee198c000, pc 0x000000001afca0
        kvm_test-4936    [023] .... 34452.451807: kvm_get_timer_map: VCPU: 32, dv: 1, dp: 0, ep: -1
        kvm_test-4936    [023] d... 34452.451807: kvm_timer_save_state:    CTL: 0x000000 CVAL:              0x0 arch_timer_ctx_index: 1
        kvm_test-4936    [023] d... 34452.451807: kvm_timer_save_state:    CTL: 0x000000 CVAL:              0x0 arch_timer_ctx_index: 0
        kvm_test-4936    [023] .... 34452.451807: kvm_userspace_exit: reason error (14)
        kvm_test-4936    [023] .... 34452.451808: kvm_get_timer_map: VCPU: 32, dv: 1, dp: 0, ep: -1
        kvm_test-4936    [023] .... 34452.451808: kvm_timer_update_irq: VCPU: 32, IRQ 27, level 0
        kvm_test-4936    [023] .... 34452.451808: kvm_timer_update_irq: VCPU: 32, IRQ 30, level 0
        kvm_test-4936    [023] d... 34452.451809: kvm_timer_restore_state: CTL: 0x000000 CVAL:              0x0 arch_timer_ctx_index: 1
        kvm_test-4936    [023] d... 34452.451809: kvm_timer_restore_state: CTL: 0x000000 CVAL:              0x0 arch_timer_ctx_index: 0
        kvm_test-4936    [023] d... 34452.451809: kvm_arm_setup_debug: vcpu: 00000000f33fa138, flags: 0x00000000
        kvm_test-4936    [023] d... 34452.451809: kvm_arm_set_dreg32: MDCR_EL2: 0x00084e66
        kvm_test-4936    [023] d... 34452.451809: kvm_arm_set_dreg32: MDSCR_EL1: 0x00001000
        kvm_test-4936    [023] d... 34452.451810: kvm_entry: PC: 0x00000000001afca0
avagin commented 3 years ago

[pid 5123] ioctl(12, _IOC(_IOC_WRITE, 0xae, 0xa0, 0x40), 0x4000008ca0) = 0

    // Host must support ARM64_HAS_RAS_EXTN.
    if _, _, errno := unix.RawSyscall( // escapes: no.
            unix.SYS_IOCTL,
            uintptr(c.fd),
            _KVM_SET_VCPU_EVENTS,
            uintptr(unsafe.Pointer(vcpuSErrNMI))); errno != 0 {
            if errno == unix.EINVAL {
                    throw("No ARM64_HAS_RAS_EXTN feature in host.")
            }
            throw("nmi sErr injection failed")
    }
avagin commented 2 years ago

@zhlhahaha could you look at this issue? It think it is quite critical. We can see that nmi is queued in a loop. Is the NMI exception handler executed in this case? If the answer is yet, why doesn't it trigger an exit to the host.

avagin commented 2 years ago

I think I found the root cause of this issue. We queue an "NMI" interrupt, but it is blocked in the guest. With this patch https://github.com/avagin/gvisor/commit/ed5c7549ab74df21c837251175df3c3660b31559, the test passes...

zhlhahaha commented 2 years ago

CC: @lubinszARM I am also looking into it.

avagin commented 2 years ago

@lubinszARM @zhlhahaha have you had a chance to look at this issue?

zhlhahaha commented 2 years ago

@lubinszARM @zhlhahaha have you had a chance to look at this issue?

Hi Avagin, I had discussed this issue with Lubin several weeks ago, but have not got good solution for it. Sorry about delay reply on it. I will keep looking on this and put it in my task list.

github-actions[bot] commented 1 year ago

A friendly reminder that this issue had no activity for 120 days.