Open cdeil opened 1 month ago
I see similar fails in CI: https://github.com/holoviz/panel/actions/runs/10333902177/job/28606768301?pr=7120
I've tried to mitigate some of these but it is indeed a game of whack-a-mole. I also couldn't reproduce a bunch of them so these are the ones I focused on:
FAILED panel/tests/ui/widgets/test_tabulator.py::test_tabulator_patch_no_height_resize - TimeoutError: wait_until timed out in 5000 milliseconds FAILED panel/tests/ui/widgets/test_tabulator.py::test_selection_indices_on_paginated_sorted_and_filtered_data[remote] - TimeoutError: wait_until timed out in 5000 milliseconds FAILED panel/tests/ui/widgets/test_tabulator.py::test_tabulator_edit_event_and_header_filters_same_column[index-True] - playwright._impl._errors.TimeoutError: Locator.fill: Timeout 20000ms exceeded.
Wow, thanks!
Maybe mark remaining flaky UI tests on MacOS only like this to remove the noise?
pytest.mark.skipif(sys.platform == 'darwin', strict=False, reason="Flaky, see GH 7118")
See https://docs.pytest.org/en/7.1.x/explanation/flaky.html
Or alternatively - do you think it should be possible to get reliable tests? Or is there something fundamental in Panel / Bokeh / Python async & threading / MacOS / Pywright / etc that prevents this?
I saw yesterday that Bokeh doesn't use Playwright and do much UI testing on MacOS probably because they've run into similar issues?
Yes, it should be possible to get more reliable tests, I'm 99% certain this is just about how the tests are structured. Specifically Playwright operates much faster than any real world usage ever would, so that causes some issues that aren't visible otherwise. By restructuring the tests and/or adding a bunch of additional timeouts we could probably make them more reliable. You could try to test that theory by re-running the UI tests with --slowmo 100
or so, which adds 100 ms timeouts between all interactions.
With latest 9404b4348a80a190b3dcda7f270ba3e5b3c10210 on MacOS I get a few UI test fails.
One run see full log where these fail:
Another run see full log where these fail:
I tried turning xdist off via
but still got test fails ( see full log ):
The textual fail is due to a recent breaking API change - see #7117
The others are flaky tests I think, although this seems to fail for me consistently now: