Open kelly987654 opened 3 years ago
Tracking in: https://issuetracker.google.com/issues/199772019, so we can get some more folks to look at this.
Here's another example. Tests were running fine for a while, then it got stuck (I can tell just from watching the Linux VM CPU, because it goes from normal spikes to a flat-line at 60% with all usage from the qemu process, Docker container shows 99% all from the qemu process).
From adb log immediately after recovery (no logs during the stuck period):
`09-13 16:42:27.190 757 757 I wpa_supplicant: wlan0: CTRL-EVENT-BEACON-LOSS
09-13 16:42:27.226 548 548 W Looper : Slow delivery took 269752ms main h=android.app.PropertyInvalidatedCache$AutoCorker$1 c=null m=0
09-13 16:42:27.226 548 548 W Looper : Slow dispatch took 109ms main h=android.app.PropertyInvalidatedCache$AutoCorker$1 c=null m=0
09-13 16:42:27.249 548 596 I libprocessgroup: Successfully killed process cgroup uid 10157 pid 10604 in 271882ms
09-13 16:42:27.254 305 305 I Zygote : Process 10604 exited due to signal 9 (Killed)
09-13 16:42:27.264 513 513 E netmgr : qemu_pipe_open_ns:62: Could not connect to the 'pipe:qemud:network' service: Invalid argument
09-13 16:42:27.264 513 513 E netmgr : Failed to open QEMU pipe 'qemud:network': Invalid argument
09-13 16:42:27.281 548 633 W SensorService: Dropping 12 cached events (968/984) to save 28/28 new events. 0 events previously dropped
09-13 16:42:27.281 548 633 W SensorService: Dropping 28 cached events (984/984) to save 28/28 new events. 28 events previously dropped
09-13 16:42:27.283 548 633 I chatty : uid=1000(system) SensorService identical 24 lines
09-13 16:42:27.283 548 633 W SensorService: Dropping 28 cached events (984/984) to save 28/28 new events. 28 events previously dropped
09-13 16:42:27.283 548 633 W SensorService: Dropping 5 cached events (984/984) to save 5/5 new events. 28 events previously dropped
09-13 16:42:27.289 548 1895 W system_server: Long monitor contention with owner Binder:548_16 (1892) at android.os.BinderProxy android.os.BinderProxy.getInstance(long, long)(BinderProxy.java:411) waiters=0 in android.os.BinderProxy android.os.BinderProxy.getInstance(long, long) for 310ms
09-13 16:42:27.302 548 633 W SensorService: Dropping 28 cached events (984/984) to save 28/28 new events. 28 events previously dropped
09-13 16:42:27.302 548 633 I chatty : uid=1000(system) SensorService identical 1 line
09-13 16:42:27.302 548 633 W SensorService: Dropping 28 cached events (984/984) to save 28/28 new events. 28 events previously dropped
09-13 16:42:27.302 517 517 E wifi_forwarder: qemu_pipe_open_ns:62: Could not connect to the 'pipe:qemud:wififorward' service: Invalid argument
09-13 16:42:27.302 548 633 W SensorService: Dropping 28 cached events (984/984) to save 28/28 new events. 28 events previously dropped
09-13 16:42:27.303 548 633 I chatty : uid=1000(system) SensorService identical 4 lines
09-13 16:42:27.303 548 633 W SensorService: Dropping 28 cached events (984/984) to save 28/28 new events. 28 events previously dropped
09-13 16:42:27.303 517 517 E wifi_forwarder: RemoteConnection failed to initialize: RemoteConnection failed to open pipe
09-13 16:42:27.303 548 633 W SensorService: Dropping 28 cached events (984/984) to save 28/28 new events. 28 events previously dropped
09-13 16:42:27.304 548 633 I chatty : uid=1000(system) SensorService identical 16 lines
09-13 16:42:27.304 548 633 W SensorService: Dropping 28 cached events (984/984) to save 28/28 new events. 28 events previously dropped
09-13 16:42:27.346 548 633 W SensorService: Dropping 1 cached events (984/984) to save 1/1 new events. 20 events previously dropped
09-13 16:37:55.665 0 0 I binder : undelivered transaction 1524010, process died.
09-13 16:42:27.355 0 0 E rcu : INFO: rcu_preempt detected stalls on CPUs/tasks:
09-13 16:42:27.361 0 0 W : (detected by 3, t=67923 jiffies, g=525421, q=405)
09-13 16:42:27.366 0 0 E rcu : All QSes seen, last rcu_preempt kthread activity 67924 (4295905529-4295837605), jiffies_till_next_fqs=1, root ->qsmask 0x0
09-13 16:42:27.377 0 0 I : R running task 12504 384 1 0x00000000
09-13 16:42:27.383 0 0 W Call Trace:
09-13 16:42:27.385 0 0 W : <IRQ>
09-13 16:42:27.387 0 0 W : sched_show_task+0x1b9/0x1d0
09-13 16:42:27.391 0 0 W : rcu_sched_clock_irq+0x9d8/0xbd0
09-13 16:42:27.395 0 0 W : ? __raise_softirq_irqoff+0xab/0xd0
09-13 16:42:27.399 0 0 W : update_process_times+0x61/0xa0
09-13 16:42:27.402 0 0 W : tick_sched_timer+0x91/0xe0
09-13 16:42:27.405 0 0 W : ? alarmtimer_fired+0x8/0x8
09-13 16:42:27.409 0 0 W : __run_hrtimer+0x1fc/0x3f0
09-13 16:42:27.412 0 0 W : ? __raise_softirq_irqoff+0xab/0xd0
09-13 16:42:27.415 0 0 W : hrtimer_interrupt+0x1a2/0x450
09-13 16:42:27.419 0 0 W : ? kvm_sched_clock_read+0x19/0x40
09-13 16:42:27.422 0 0 W : ? clockevents_register_device.cfi_jt+0x8/0x8
09-13 16:42:27.427 0 0 W : smp_apic_timer_interrupt+0x16c/0x390
09-13 16:42:27.431 0 0 W : apic_timer_interrupt+0xf/0x20
09-13 16:42:27.434 0 0 W : </IRQ>`
@pokowaka - Thanks for looking at this. I have been testing today with the 29-google-x64-no-metrics:30.3.4 container and have not been able to reproduce the issue, so I am inclined to think it is something related to the Android 11 container. Is there any suggestion for something I could try? I saw the other comments, we are just using the generic command I mentioned, and nothing USB related.
Using the below start command, the emulator in the docker container consistently crashes partway through our test runs (not on the same test). The docker container works fine for a while, then the tests lose access to the emulator, and the container status shows as 'unhealthy'. The qemu process on both the Linux VM host and within the docker container show 100% cpu usage, and the emulator is no longer accessible via adb.
System info:
Start command:
Logs immediately after crash: