holoviz / panel

Panel: The powerful data exploration & web app framework for Python
https://panel.holoviz.org
BSD 3-Clause "New" or "Revised" License
4.63k stars 505 forks source link

Flaky UI tests for ToggleIcon and Tabulator on MacOS #7118

Open cdeil opened 1 month ago

cdeil commented 1 month ago

With latest 9404b4348a80a190b3dcda7f270ba3e5b3c10210 on MacOS I get a few UI test fails.

One run see full log where these fail:

FAILED panel/tests/ui/pane/test_textual.py::test_textual_app - TimeoutError: wait_until timed out in 5000 milliseconds
FAILED panel/tests/ui/widgets/test_tabulator.py::test_tabulator_patch_no_height_resize - TimeoutError: wait_until timed out in 5000 milliseconds
FAILED panel/tests/ui/widgets/test_tabulator.py::test_selection_indices_on_paginated_sorted_and_filtered_data[remote] - TimeoutError: wait_until timed out in 5000 milliseconds
FAILED panel/tests/ui/widgets/test_tabulator.py::test_tabulator_edit_event_and_header_filters_same_column[index-True] - playwright._impl._errors.TimeoutError: Locator.fill: Timeout 20000ms exceeded.
FAILED panel/tests/ui/widgets/test_tabulator.py::test_tabulator_edit_event_and_header_filters_same_column[index-False] - playwright._impl._errors.TimeoutError: Locator.fill: Timeout 20000ms exceeded.
FAILED panel/tests/ui/widgets/test_tabulator.py::test_tabulator_edit_event_and_header_filters_same_column[foo-False] - playwright._impl._errors.TimeoutError: Locator.fill: Timeout 20000ms exceeded.
FAILED panel/tests/ui/widgets/test_tabulator.py::test_tabulator_edit_event_and_header_filters_same_column[foo-True] - playwright._impl._errors.TimeoutError: Locator.fill: Timeout 20000ms exceeded.

Another run see full log where these fail:

FAILED panel/tests/ui/io/test_reload.py::test_reload_app_on_local_module_change - TimeoutError: wait_until timed out in 5000 milliseconds
FAILED panel/tests/ui/pane/test_textual.py::test_textual_app - TimeoutError: wait_until timed out in 5000 milliseconds
FAILED panel/tests/ui/widgets/test_tabulator.py::test_tabulator_patch_no_height_resize - TimeoutError: wait_until timed out in 5000 milliseconds
FAILED panel/tests/ui/widgets/test_tabulator.py::test_selection_indices_on_paginated_sorted_and_filtered_data[remote] - TimeoutError: wait_until timed out in 5000 milliseconds
FAILED panel/tests/ui/widgets/test_tabulator.py::test_tabulator_edit_event_and_header_filters_same_column[index-True] - playwright._impl._errors.TimeoutError: Locator.fill: Timeout 20000ms exceeded.
=

I tried turning xdist off via

$ pixi run -e test-ui pytest --ui panel/tests/ui/widgets/test_icon.py -v --browser chromium -n logical --dist no -n 0

but still got test fails ( see full log ):

FAILED panel/tests/ui/pane/test_textual.py::test_textual_app - TimeoutError: wait_until timed out in 5000 milliseconds
FAILED panel/tests/ui/pane/test_vizzu.py::test_vizzu_click - TimeoutError: wait_until timed out in 5000 milliseconds
FAILED panel/tests/ui/template/test_editabletemplate.py::test_editable_template_drag_item - TimeoutError: wait_until timed out in 5000 milliseconds
FAILED panel/tests/ui/widgets/test_icon.py::test_toggle_icon_width_height - TimeoutError: wait_until timed out in 5000 milliseconds
FAILED panel/tests/ui/widgets/test_icon.py::test_toggle_icon_size - TimeoutError: wait_until timed out in 5000 milliseconds
FAILED panel/tests/ui/widgets/test_tabulator.py::test_tabulator_patch_no_height_resize - TimeoutError: wait_until timed out in 5000 milliseconds
FAILED panel/tests/ui/widgets/test_tabulator.py::test_tabulator_header_filter_no_horizontal_rescroll[remote] - AssertionError: assert {'height': 20...: 714, 'y': 9} == {'height': 20...: 264, 'y': 9}
FAILED panel/tests/ui/widgets/test_tabulator.py::test_tabulator_edit_event_and_header_filters_same_column[index-True] - playwright._impl._errors.TimeoutError: Locator.fill: Timeout 20000ms exceeded.
FAILED panel/tests/ui/widgets/test_tabulator.py::test_tabulator_edit_event_and_header_filters_same_column[index-False] - playwright._impl._errors.TimeoutError: Locator.fill: Timeout 20000ms exceeded.
FAILED panel/tests/ui/widgets/test_tabulator.py::test_tabulator_edit_event_and_header_filters_same_column[foo-True] - playwright._impl._errors.TimeoutError: Locator.fill: Timeout 20000ms exceeded.
FAILED panel/tests/ui/widgets/test_tabulator.py::test_tabulator_edit_event_and_header_filters_same_column[foo-False] - playwright._impl._errors.TimeoutError: Locator.fill: Timeout 20000ms exceeded.
FAILED panel/tests/ui/widgets/test_tabulator.py::test_selection_indices_on_paginated_sorted_and_filtered_data[remote] - TimeoutError: wait_until timed out in 5000 milliseconds
ERROR panel/tests/ui/widgets/test_tabulator.py::test_tabulator_header_filter_no_horizontal_rescroll[None] - pluggy.PluggyTeardownRaisedWarning: A plugin raised an exception during an old-style hookwrapper teardown.

The textual fail is due to a recent breaking API change - see #7117

The others are flaky tests I think, although this seems to fail for me consistently now:

panel $ pixi run -e test-ui pytest --ui panel/tests/ui/widgets/test_icon.py -k test_toggle_icon_size -v
=============================================================================================== test session starts ================================================================================================
platform darwin -- Python 3.12.5, pytest-7.4.4, pluggy-1.5.0 -- /Users/cdeil/code/oss/panel/.pixi/envs/test-ui/bin/python3.12
cachedir: .pytest_cache
rootdir: /Users/cdeil/code/oss/panel
configfile: pyproject.toml
plugins: asyncio-0.23.8, cov-5.0.0, github-actions-annotate-failures-0.2.0, playwright-0.5.0, rerunfailures-14.0, anyio-4.4.0, base-url-2.1.0, xdist-3.6.1
asyncio: mode=Mode.AUTO
collected 17 items / 16 deselected / 1 selected                                                                                                                                                                    

panel/tests/ui/widgets/test_icon.py::test_toggle_icon_size FAILED                                                                                                                                            [100%]

===================================================================================================== FAILURES =====================================================================================================
______________________________________________________________________________________________ test_toggle_icon_size _______________________________________________________________________________________________

page = <Page url='http://localhost:65348/'>

    def test_toggle_icon_size(page):
        icon = ToggleIcon(size="120px")
        serve_component(page, icon)

        # test defaults
        assert icon.icon == "heart"
        assert not icon.value
        icon_element = page.locator(".ti-heart")

>       wait_until(lambda: icon_element.bounding_box()["width"] == 120)
E       TimeoutError: wait_until timed out in 5000 milliseconds

panel/tests/ui/widgets/test_icon.py:66: TimeoutError
----------------------------------------------------------------------------------------------- Captured stdout call -----------------------------------------------------------------------------------------------
Launching server at http://localhost:65348
----------------------------------------------------------------------------------------------- Captured stderr call -----------------------------------------------------------------------------------------------
INFO:bokeh.server.server:Starting Bokeh server version 3.5.1 (running on Tornado 6.4.1)
INFO:bokeh.server.tornado:User authentication hooks NOT provided (default user enabled)
INFO:bokeh.server.views.ws:WebSocket connection opened
INFO:bokeh.server.views.ws:ServerConnection created
------------------------------------------------------------------------------------------------ Captured log call -------------------------------------------------------------------------------------------------
INFO     tornado.access:web.py:2348 200 GET /liveness (127.0.0.1) 0.39ms
INFO     tornado.access:web.py:2348 200 GET / (::1) 17.99ms
INFO     tornado.access:web.py:2348 200 GET /static/js/bokeh.min.js?v=276377ed021e1611c60311b355033c865900f31a918aa4565aba37a78700f17b017100a8a618bded4140c6ad247a0b0237d3a02bee9fd722ce67a459479522dc (::1) 1.99ms
INFO     tornado.access:web.py:2348 200 GET /static/extensions/panel/bundled/reactiveesm/es-module-shims@%5E1.10.0/dist/es-module-shims.min.js (::1) 2.09ms
INFO     tornado.access:web.py:2348 200 GET /static/js/bokeh-gl.min.js?v=70bc1a9856b732e888ed6b2a8e9b6382bf538fee3ec9f1145b8db1778158fd51e478dbe0600650e30d5a0083b12fc43961bc7b2ef3e9f366000199b83b9a1644 (::1) 0.38ms
INFO     tornado.access:web.py:2348 200 GET /static/extensions/panel/panel.min.js?v=a91daab4668e3299f59ed231b5da2e657f5e65d10a1d501ff0a660306b1fdb79 (::1) 4.22ms
INFO     tornado.access:web.py:2348 200 GET /static/js/bokeh-widgets.min.js?v=8541420c1bb1dbde534df1d9b2be7c8248f61fca353a821ffc4d459b08b79c4b39f0ea1dd6960aa3b734bea988cf822dc6993c786de844db80e4f258dd90727f (::1) 1.91ms
INFO     tornado.access:web.py:2348 200 GET /static/js/bokeh-tables.min.js?v=26281191594de496d010d87b3a56c1679330da29fcf72d3dab91ac4a45479c16b36e82ce4325f4217df4614fad13927fd7f1e1be64cf838e4a18a60852e2be0e (::1) 2.00ms
INFO     tornado.access:web.py:2348 101 GET /ws (::1) 0.32ms
INFO     tornado.access:web.py:2348 200 GET /static/extensions/panel/css/loading.css?v=1.5.0-b.3 (::1) 1.10ms
INFO     tornado.access:web.py:2348 200 GET /static/extensions/panel/css/icon.css?v=1.5.0-b.3 (::1) 1.52ms
INFO     tornado.access:web.py:2348 200 GET /static/extensions/panel/bundled/theme/default.css?v=1.5.0-b.3 (::1) 4.11ms
INFO     tornado.access:web.py:2348 200 GET /static/extensions/panel/bundled/theme/native.css?v=1.5.0-b.3 (::1) 9.85ms
--------------------------------------------------------------------------------------------- Captured stderr teardown ---------------------------------------------------------------------------------------------
INFO:bokeh.server.views.ws:WebSocket connection closed: code=1001, reason=None
============================================================================================= short test summary info ==============================================================================================
FAILED panel/tests/ui/widgets/test_icon.py::test_toggle_icon_size - TimeoutError: wait_until timed out in 5000 milliseconds
========================================================================================= 1 failed, 16 deselected in 6.08s =========================================================================================
cdeil commented 1 month ago

I see similar fails in CI: https://github.com/holoviz/panel/actions/runs/10333902177/job/28606768301?pr=7120

philippjfr commented 1 month ago

I've tried to mitigate some of these but it is indeed a game of whack-a-mole. I also couldn't reproduce a bunch of them so these are the ones I focused on:

FAILED panel/tests/ui/widgets/test_tabulator.py::test_tabulator_patch_no_height_resize - TimeoutError: wait_until timed out in 5000 milliseconds FAILED panel/tests/ui/widgets/test_tabulator.py::test_selection_indices_on_paginated_sorted_and_filtered_data[remote] - TimeoutError: wait_until timed out in 5000 milliseconds FAILED panel/tests/ui/widgets/test_tabulator.py::test_tabulator_edit_event_and_header_filters_same_column[index-True] - playwright._impl._errors.TimeoutError: Locator.fill: Timeout 20000ms exceeded.

cdeil commented 1 month ago

Wow, thanks!

Maybe mark remaining flaky UI tests on MacOS only like this to remove the noise?

pytest.mark.skipif(sys.platform == 'darwin', strict=False, reason="Flaky, see GH 7118")

See https://docs.pytest.org/en/7.1.x/explanation/flaky.html

Or alternatively - do you think it should be possible to get reliable tests? Or is there something fundamental in Panel / Bokeh / Python async & threading / MacOS / Pywright / etc that prevents this?

I saw yesterday that Bokeh doesn't use Playwright and do much UI testing on MacOS probably because they've run into similar issues?

philippjfr commented 1 month ago

Yes, it should be possible to get more reliable tests, I'm 99% certain this is just about how the tests are structured. Specifically Playwright operates much faster than any real world usage ever would, so that causes some issues that aren't visible otherwise. By restructuring the tests and/or adding a bunch of additional timeouts we could probably make them more reliable. You could try to test that theory by re-running the UI tests with --slowmo 100 or so, which adds 100 ms timeouts between all interactions.