missionpinball / mpf

Mission Pinball Framework: Open source software to run a real pinball machine.
http://missionpinball.org
MIT License
217 stars 143 forks source link

OPP test hangs on GitHub Action when running on Windows #1716

Open toomanybrians opened 1 year ago

toomanybrians commented 1 year ago

We're getting random failures of the tests running on Windows, across all versions of Python (3.8-3.11). It's totally random, but consistent. Different windows tests fail, and if you rerun tests, ones that failed might pass and vice versa. I created the investigate-github-tests branch to investigate:

https://github.com/missionpinball/mpf/tree/investigate-github-tests

Here are the workflow actions that have run on that branch. You can download the logs and see the details. (I enabled verbose logging for the tests, and also disabled the non-Windows tests.) https://github.com/missionpinball/mpf/actions?query=branch%3Ainvestigate-github-tests

I've narrowed it down to the test_OPP.TestOPPStm32.testOpp() test. (This test has been disabled in the dev branch, and the latest on that branch as of this writing.)

Here's a log of a test that fails. (Actually, it just hangs, not a failure. The keyboard interrupt is what comes in when the job is canceled, whether canceled via the UI, or canceled automatically after 6 hours by GitHub.) The logs for all tests that fail are identical here, same lines, same hang:

2023-09-16T18:15:26.0425498Z 2023-09-16 18:15:26,031 : 1.616 : DEBUG : OPP : com1 sending: b' @0\x00\x00\x01\x0b\xb8\xc7\xea' ( 0x20 0x40 0x30 0x00 0x00 0x01 0x0b 0xb8 0xc7 0xea)
2023-09-16T18:15:26.0427435Z 2023-09-16 18:15:26,031 : 1.616 : INFO : Machine : Advancing time by 0.1
2023-09-16T18:15:26.0437950Z 2023-09-16 18:15:26,031 : 1.625 : DEBUG : OPP : com1 sending: b' \x08\x00\x00\x00\x00\x8d\xff' ( 0x20 0x08 0x00 0x00 0x00 0x00 0x8d 0xff)
2023-09-16T18:15:26.0440750Z 2023-09-16 18:15:26,031 : 1.625 : DEBUG : OPP : com1 received: b' \x08\x00\xff\x00\x0c\x82\xff' ( 0x20 0x08 0x00 0xff 0x00 0x0c 0x82 0xff)
2023-09-16T18:15:26.0451172Z 2023-09-16 18:15:26,031 : 1.635 : DEBUG : OPP : com1 sending: b' \x08\x00\x00\x00\x00\x8d\xff' ( 0x20 0x08 0x00 0x00 0x00 0x00 0x8d 0xff)
2023-09-16T18:15:26.0453833Z 2023-09-16 18:15:26,031 : 1.635 : DEBUG : OPP : com1 received: b' \x08\x00\xff\x00\x0c\x82\xff' ( 0x20 0x08 0x00 0xff 0x00 0x0c 0x82 0xff)
2023-09-16T18:15:26.0464269Z 2023-09-16 18:15:26,031 : 1.645 : DEBUG : OPP : com1 sending: b' \x08\x00\x00\x00\x00\x8d\xff' ( 0x20 0x08 0x00 0x00 0x00 0x00 0x8d 0xff)
2023-09-16T18:15:26.0466921Z 2023-09-16 18:15:26,031 : 1.645 : DEBUG : OPP : com1 received: b' \x08\x00\xff\x00\x0c\x82\xff' ( 0x20 0x08 0x00 0xff 0x00 0x0c 0x82 0xff)
2023-09-16T18:15:26.0479412Z 2023-09-16 18:15:26,047 : 1.655 : DEBUG : OPP : com1 sending: b' \x08\x00\x00\x00\x00\x8d\xff' ( 0x20 0x08 0x00 0x00 0x00 0x00 0x8d 0xff)
2023-09-16T18:15:26.0482022Z 2023-09-16 18:15:26,047 : 1.655 : DEBUG : OPP : com1 received: b' \x08\x00\xff\x00\x0c\x82\xff' ( 0x20 0x08 0x00 0xff 0x00 0x0c 0x82 0xff)
2023-09-16T18:15:26.0492687Z 2023-09-16 18:15:26,047 : 1.665 : DEBUG : OPP : com1 sending: b' \x08\x00\x00\x00\x00\x8d\xff' ( 0x20 0x08 0x00 0x00 0x00 0x00 0x8d 0xff)
2023-09-16T18:15:26.0495248Z 2023-09-16 18:15:26,047 : 1.665 : DEBUG : OPP : com1 received: b' \x08\x00\xff\x00\x0c\x82\xff' ( 0x20 0x08 0x00 0xff 0x00 0x0c 0x82 0xff)
2023-09-16T18:15:26.0505724Z 2023-09-16 18:15:26,047 : 1.675 : DEBUG : OPP : com1 sending: b' \x08\x00\x00\x00\x00\x8d\xff' ( 0x20 0x08 0x00 0x00 0x00 0x00 0x8d 0xff)
2023-09-16T18:15:26.0508413Z 2023-09-16 18:15:26,047 : 1.675 : DEBUG : OPP : com1 received: b' \x08\x00\xff\x00\x0c\x82\xff' ( 0x20 0x08 0x00 0xff 0x00 0x0c 0x82 0xff)
2023-09-16T18:15:26.0518876Z 2023-09-16 18:15:26,047 : 1.685 : DEBUG : OPP : com1 sending: b' \x08\x00\x00\x00\x00\x8d\xff' ( 0x20 0x08 0x00 0x00 0x00 0x00 0x8d 0xff)
2023-09-16T18:15:26.0521547Z 2023-09-16 18:15:26,047 : 1.685 : DEBUG : OPP : com1 received: b' \x08\x00\xff\x00\x0c\x82\xff' ( 0x20 0x08 0x00 0xff 0x00 0x0c 0x82 0xff)
2023-09-16T18:15:26.0532018Z 2023-09-16 18:15:26,047 : 1.695 : DEBUG : OPP : com1 sending: b' \x08\x00\x00\x00\x00\x8d\xff' ( 0x20 0x08 0x00 0x00 0x00 0x00 0x8d 0xff)
2023-09-16T18:15:26.0534682Z 2023-09-16 18:15:26,047 : 1.695 : DEBUG : OPP : com1 received: b' \x08\x00\xff\x00\x0c\x82\xff' ( 0x20 0x08 0x00 0xff 0x00 0x0c 0x82 0xff)
2023-09-16T18:15:26.0545168Z 2023-09-16 18:15:26,047 : 1.705 : DEBUG : OPP : com1 sending: b' \x08\x00\x00\x00\x00\x8d\xff' ( 0x20 0x08 0x00 0x00 0x00 0x00 0x8d 0xff)
2023-09-16T18:15:26.0547847Z 2023-09-16 18:15:26,047 : 1.705 : DEBUG : OPP : com1 received: b' \x08\x00\xff\x00\x0c\x82\xff' ( 0x20 0x08 0x00 0xff 0x00 0x0c 0x82 0xff)
2023-09-16T18:15:26.0558295Z 2023-09-16 18:15:26,047 : 1.715 : DEBUG : OPP : com1 sending: b' \x08\x00\x00\x00\x00\x8d\xff' ( 0x20 0x08 0x00 0x00 0x00 0x00 0x8d 0xff)
2023-09-16T18:15:26.0561018Z 2023-09-16 18:15:26,047 : 1.715 : DEBUG : OPP : com1 received: b' \x08\x00\xff\x00\x0c\x82\xff' ( 0x20 0x08 0x00 0xff 0x00 0x0c 0x82 0xff)
2023-09-16T18:15:26.0563920Z 2023-09-16 18:15:26,047 : 1.716 : DEBUG : Machine : Test ended
2023-09-16T18:15:26.0565131Z 2023-09-16 18:15:26,047 : 1.716 : INFO : Machine : Shutting down...
2023-09-16T18:15:26.0566729Z 2023-09-16 18:15:26,047 : 1.716 : INFO : EventManager : Event: ======'shutdown'====== Args={}
2023-09-16T18:15:26.0568374Z 2023-09-16 18:15:26,047 : 1.716 : DEBUG : OPP : Set servo position on 19088743:  0x20 0x40 0x30 0x00 0x00 0x01 0x00 0x00 0x00 0xba
2023-09-16T18:15:26.0569260Z 2023-09-16 18:15:26,047 : 1.716 : DEBUG : OPP : com1 sending: b' @0\x00\x00\x01\x00\x00\x00\xba' ( 0x20 0x40 0x30 0x00 0x00 0x01 0x00 0x00 0x00 0xba)
2023-09-16T18:15:26.0573146Z 2023-09-16 18:15:26,047 : 1.716 : ERROR : OPP : Stop called on serial connection com1
2023-09-16T18:18:05.9912744Z Traceback (most recent call last):
2023-09-16T18:18:05.9921602Z   File "C:\hostedtoolcache\windows\Python\3.9.13\x64\lib\runpy.py", line 197, in _run_module_as_main
2023-09-16T18:18:05.9922655Z     return _run_code(code, main_globals, None,
2023-09-16T18:18:05.9924713Z   File "C:\hostedtoolcache\windows\Python\3.9.13\x64\lib\runpy.py", line 87, in _run_code
2023-09-16T18:18:05.9925646Z     exec(code, run_globals)
2023-09-16T18:18:05.9927202Z   File "C:\hostedtoolcache\windows\Python\3.9.13\x64\Scripts\coverage.exe\__main__.py", line 7, in <module>
2023-09-16T18:18:05.9928329Z   File "C:\hostedtoolcache\windows\Python\3.9.13\x64\lib\site-packages\coverage\cmdline.py", line 943, in main
2023-09-16T18:18:05.9929004Z     status = CoverageScript().command_line(argv)
2023-09-16T18:18:05.9929771Z   File "C:\hostedtoolcache\windows\Python\3.9.13\x64\lib\site-packages\coverage\cmdline.py", line 659, in command_line
2023-09-16T18:18:05.9930334Z     return self.do_run(options, args)
2023-09-16T18:18:05.9931060Z   File "C:\hostedtoolcache\windows\Python\3.9.13\x64\lib\site-packages\coverage\cmdline.py", line 830, in do_run
2023-09-16T18:18:05.9931545Z     runner.run()
2023-09-16T18:18:05.9932202Z   File "C:\hostedtoolcache\windows\Python\3.9.13\x64\lib\site-packages\coverage\execfile.py", line 199, in run
2023-09-16T18:18:05.9933429Z     exec(code, main_mod.__dict__)
2023-09-16T18:18:05.9934018Z   File "C:\hostedtoolcache\windows\Python\3.9.13\x64\lib\unittest\__main__.py", line 18, in <module>
2023-09-16T18:18:05.9934507Z     main(module=None)
2023-09-16T18:18:05.9935057Z   File "C:\hostedtoolcache\windows\Python\3.9.13\x64\lib\unittest\main.py", line 101, in __init__
2023-09-16T18:18:05.9935518Z     self.runTests()
2023-09-16T18:18:05.9936040Z   File "C:\hostedtoolcache\windows\Python\3.9.13\x64\lib\unittest\main.py", line 271, in runTests
2023-09-16T18:18:05.9936613Z     self.result = testRunner.run(self.test)
2023-09-16T18:18:05.9937168Z   File "C:\hostedtoolcache\windows\Python\3.9.13\x64\lib\unittest\runner.py", line 184, in run
2023-09-16T18:18:05.9937613Z     test(result)
2023-09-16T18:18:05.9938116Z   File "C:\hostedtoolcache\windows\Python\3.9.13\x64\lib\unittest\suite.py", line 84, in __call__
2023-09-16T18:18:05.9938616Z     return self.run(*args, **kwds)
2023-09-16T18:18:05.9939145Z   File "C:\hostedtoolcache\windows\Python\3.9.13\x64\lib\unittest\suite.py", line 122, in run
2023-09-16T18:18:05.9939597Z     test(result)
2023-09-16T18:18:05.9940114Z   File "C:\hostedtoolcache\windows\Python\3.9.13\x64\lib\unittest\suite.py", line 84, in __call__
2023-09-16T18:18:05.9940668Z     return self.run(*args, **kwds)
2023-09-16T18:18:05.9941191Z   File "C:\hostedtoolcache\windows\Python\3.9.13\x64\lib\unittest\suite.py", line 122, in run
2023-09-16T18:18:05.9941639Z     test(result)
2023-09-16T18:18:05.9942138Z   File "C:\hostedtoolcache\windows\Python\3.9.13\x64\lib\unittest\suite.py", line 84, in __call__
2023-09-16T18:18:05.9942609Z     return self.run(*args, **kwds)
2023-09-16T18:18:05.9943154Z   File "C:\hostedtoolcache\windows\Python\3.9.13\x64\lib\unittest\suite.py", line 122, in run
2023-09-16T18:18:05.9943681Z     test(result)
2023-09-16T18:18:05.9944185Z   File "C:\hostedtoolcache\windows\Python\3.9.13\x64\lib\unittest\case.py", line 651, in __call__
2023-09-16T18:18:05.9944699Z     return self.run(*args, **kwds)
2023-09-16T18:18:05.9945216Z   File "C:\hostedtoolcache\windows\Python\3.9.13\x64\lib\unittest\case.py", line 595, in run
2023-09-16T18:18:05.9945771Z     self._callTearDown()
2023-09-16T18:18:05.9946305Z   File "C:\hostedtoolcache\windows\Python\3.9.13\x64\lib\unittest\case.py", line 553, in _callTearDown
2023-09-16T18:18:05.9946898Z     self.tearDown()
2023-09-16T18:18:05.9947360Z   File "D:\a\mpf\mpf\mpf\tests\test_OPP.py", line 169, in tearDown
2023-09-16T18:18:05.9947809Z     super().tearDown()
2023-09-16T18:18:05.9948261Z   File "D:\a\mpf\mpf\mpf\tests\MpfTestCase.py", line 979, in tearDown
2023-09-16T18:18:05.9950354Z     self.machine._do_stop()
2023-09-16T18:18:05.9950861Z   File "D:\a\mpf\mpf\mpf\core\machine.py", line 720, in _do_stop
2023-09-16T18:18:05.9953196Z     self.shutdown()
2023-09-16T18:18:05.9953665Z   File "D:\a\mpf\mpf\mpf\core\machine.py", line 750, in shutdown
2023-09-16T18:18:05.9968210Z     self._platform_stop()
2023-09-16T18:18:05.9968955Z   File "D:\a\mpf\mpf\mpf\core\machine.py", line 800, in _platform_stop
2023-09-16T18:18:05.9969451Z     hardware_platform.stop()
2023-09-16T18:18:05.9969957Z   File "D:\a\mpf\mpf\mpf\platforms\opp\opp.py", line 191, in stop
2023-09-16T18:18:05.9970379Z     connections.stop()
2023-09-16T18:18:05.9971316Z   File "D:\a\mpf\mpf\mpf\platforms\base_serial_communicator.py", line 148, in stop
2023-09-16T18:18:05.9971902Z     self.machine.clock.loop.run_until_complete(self.writer.wait_closed())
2023-09-16T18:18:05.9972570Z   File "C:\hostedtoolcache\windows\Python\3.9.13\x64\lib\asyncio\base_events.py", line 634, in run_until_complete
2023-09-16T18:18:05.9973078Z     self.run_forever()
2023-09-16T18:18:05.9973599Z   File "C:\hostedtoolcache\windows\Python\3.9.13\x64\lib\asyncio\base_events.py", line 601, in run_forever
2023-09-16T18:18:05.9974086Z     self._run_once()
2023-09-16T18:18:05.9974538Z   File "D:\a\mpf\mpf\mpf\tests\loop.py", line 444, in _run_once
2023-09-16T18:18:05.9975020Z     super()._run_once()
2023-09-16T18:18:05.9975625Z   File "C:\hostedtoolcache\windows\Python\3.9.13\x64\lib\asyncio\base_events.py", line 1905, in _run_once
2023-09-16T18:18:05.9976114Z     handle._run()
2023-09-16T18:18:05.9976618Z   File "C:\hostedtoolcache\windows\Python\3.9.13\x64\lib\asyncio\events.py", line 80, in _run
2023-09-16T18:18:05.9977176Z     self._context.run(self._callback, *self._args)
2023-09-16T18:18:05.9977565Z KeyboardInterrupt

I assume this is something with the serial port stopping hanging up? Dunno if it's an asyncio thing or what?

I'm going to assign this to @cobra18t since you maintain the OPP platform. Feel free to keep using this investigate-github-tests branch to figure this out. I don't think this is a showstopper, so take your time. I've disabled this test in dev so it doesn't hold us up.

cobra18t commented 1 year ago

Do we know when this started happening? Have these always failed?

toomanybrians commented 1 year ago

This only happens with the updated MPF 0.57 which runs on Python 3.10+. That required making some minor changes all throughout MPF for asyncio. I also got rid of the custom YAML processing since the way that worked changed in more recent version of rumael.yaml which were not compatible with the latest versions of python.

So I think the OPP test was always fine, but now with these changes something randomly breaks. Like I said, I just disabled the test since it's not a showstopper, and the tests still run fine on the stable MPF 0.56.x. This is more if just an FYI that something in those tests in funky with the latest versions of python.