Closed gamesguru closed 5 years ago
Attaching the atx log. Same thing happened today at a swipe up
interesting point, need
what is annoying about this (and it really only happens on a few devices out of 40 at the end of a week), is that it can happen with any command. dump_hierarchy()
, swipedown()
, tap()
, I would assume also adb_shell()
and others.
Therefore, to avoid general exceptions in the test we MUST wrap every ui automation command in a try except
block. Not possible for the average person to fix (without greatly re-working their project), and anyways it makes more sense to fix it on the server side.
The device can be easily re-inited without rebooting, it is as if the issue is just a brief loss in usb. Only happens once or twice out of 40 devices running round the clock for weeks at at time. But currently, my project would not support it without lots of changes, try
wrapper blocks mostly
I'm curious what you will think
Yes I understand your feeling. I encountered this before. My strategy is to make tests stateless (no left-over on disk). Then do the big try-catch at test-level, with a test scheduler. Basically, the scheduler runs one test and harvests its result. If it detects failure because of the driver(u2), simply mark the device as temporarily down and reboot the device, re-init it to bring the device back again. Think of the driver failure like network connectivity interruption, which can happen randomly. And it is fair for the driver failure to happen more often, because a device environment is more complicated (dead wifi/network down/some stupid app has a service leaking memory/some automation accidentally cripples the OS, etc). Recovery (retry here) is inevitable. By the end of the day, the scheduler reports a per-device down-time-to-service-time ratio and overall metric for an operator (and the boss) to keep an eye on the health and efficiency of the entire device farm.
BTW if your tests are dependent, it is still OK to use this design. The added complexity it for the scheduler to know about the dependency and restart necessary preceding tests for a failed one.
True, the end-user should be able to handle this without too much work on their end. Just re-init and pick up where left off.
If the automator dies, I would restart the whole test suite. If that fails 5 times in a row, I would assume some bug in uiautomator, to abort all, and mark tests as inconclusive.
Are you using xUnit or pytest or just your own implementation?
Note that this is a rare occurrence. Seeing this more in stress-tests that run continuously, than in quick unit tests. Then only, I see it out of 40 devices running a week on maybe one or two?
The scheduler is a plain Python script, ~200LOC, using multiprocess
one per device, and metrics updated in the console. Tests are u2 separate scripts.
The scheduler, you made it yourself? Are results saved to file or only printed to console?
Test results persistent to a remote DB and metrics (the health of devices, effective-test-time-per-device, etc) shown in the console.
Another issue on my rack of Allwinner d7s, sometimes one or two are dying over the weekend. Not a big deal but worth looking into.
Here we died at a swipe. It should be possible to seamlessly
init --serial
, but only after you accept the pull requests to fix #206