pentschev / ucx-py-ci

UCX-Py CI Issue Tracker
1 stars 1 forks source link

Nightly Tests for ucx-master-protov1 from 2024-09-17 22:00: 12 failures; 3 timeouts; 20/36 scenarios with performance regressions #3056

Open pentschev opened 4 days ago

pentschev commented 4 days ago

Test results

Timeouts

local-cudf-merge-ucx-nvlink timed out, full logs. local-cupy-transpose-sum-ucx-nvlink timed out, full logs. ucx-py-ib-rdmacm-debug-test timed out, full logs.

Failures

12 failures in `dask-cuda`, [full logs](https://raw.githack.com/pentschev/ucx-py-ci/test-results/assets/ucx-master-protov1-dask-cuda-202409172200.html). Failed tests: - `dask_cuda.tests.test_dask_cuda_worker::test_rmm_pool` - `dask_cuda.tests.test_dask_cuda_worker::test_rmm_managed` - `dask_cuda.tests.test_dask_cuda_worker::test_rmm_async` - `dask_cuda.tests.test_dask_cuda_worker::test_rmm_async_with_maximum_pool_size` - `dask_cuda.tests.test_dask_cuda_worker::test_rmm_logging` - `dask_cuda.tests.test_dask_cuda_worker::test_cudf_spill_disabled` - `dask_cuda.tests.test_dask_cuda_worker::test_cudf_spill` - `dask_cuda.tests.test_dask_cuda_worker::test_dashboard_address` - `dask_cuda.tests.test_dask_cuda_worker::test_rmm_track_allocations` - `dask_cuda.tests.test_proxify_host_file::test_spill_on_demand` - `dask_cuda.tests.test_proxy::test_communicating_disk_objects[False-ucx]` - `dask_cuda.tests.test_spill::test_cupy_cluster_device_spill[params0]`

Passes

All tests passed in ucx-py-ib-test, full logs. All tests passed in ucx-py-libs-ib-test, full logs. All tests passed in ucx-py-libs-nvlink-test, full logs. All tests passed in ucx-py-libs-tcp-test, full logs. All tests passed in ucx-py-nvlink-test, full logs. All tests passed in ucx-py-tcp-test, full logs.

Performance results

Failures

Scenario numpy-TAG-Core-TCP SM expected_bw=4.0GB/s failed with bw 3.37 GB/s; has never passed

Scenario numpy-TAG-Core-RC expected_bw=16.0GB/s failed with bw 8.19 GB/s; has never passed

Scenario numpy-AM-Core-TCP SM expected_bw=2.8GB/s failed with bw 2.52 GB/s; has never passed

Scenario numpy-AM-Core-RC expected_bw=3.0GB/s failed with bw 2.04 GB/s; has never passed

Scenario numpy-TAG-Async-TCP SM expected_bw=4.0GB/s failed with bw 3.47 GB/s; has never passed

Scenario numpy-TAG-Async-RC expected_bw=16.0GB/s failed with bw 8.13 GB/s; has never passed

Scenario numpy-AM-Async-TCP SM expected_bw=2.8GB/s failed with bw 2.48 GB/s; has never passed

Scenario numpy-AM-Async-RC expected_bw=3.0GB/s failed with bw 2.07 GB/s; has never passed

Scenario cupy-TAG-Core-TCP SM expected_bw=3.0GB/s failed with bw 2.35 GB/s; has never passed

Scenario cupy-TAG-Core-TCP expected_bw=3.0GB/s failed with bw 2.34 GB/s; has never passed

Scenario cupy-TAG-Core-RC, expected_bw=12.0GB/s failed with bw 11.34GB/s; last pass on 2024-09-05T22:00:00 (UCX-Py version 0.40.0; UCX commit 2a78fca)

Scenario cupy-AM-Core-TCP SM expected_bw=3.0GB/s failed with bw 2.32 GB/s; has never passed

Scenario cupy-AM-Core-TCP expected_bw=3.0GB/s failed with bw 2.32 GB/s; has never passed

Scenario cupy-AM-Core-RC, expected_bw=12.0GB/s failed with bw 11.34GB/s; last pass on 2024-08-06T05:00:00 (UCX-Py version 0.39.0; UCX commit 7510c32)

Scenario cupy-TAG-Async-TCP SM expected_bw=3.0GB/s failed with bw 2.31 GB/s; has never passed

Scenario cupy-TAG-Async-TCP expected_bw=3.0GB/s failed with bw 2.31 GB/s; has never passed

Scenario cupy-TAG-Async-RC, expected_bw=12.0GB/s failed with bw 11.31GB/s; last pass on 2024-05-09T05:00:00 (UCX-Py version 0.38.0; UCX commit f02bc54)

Scenario cupy-AM-Async-TCP SM expected_bw=3.0GB/s failed with bw 2.3 GB/s; has never passed

Scenario cupy-AM-Async-TCP expected_bw=3.0GB/s failed with bw 2.31 GB/s; has never passed

Scenario cupy-AM-Async-RC, expected_bw=12.0GB/s failed with bw 11.23GB/s; last pass on 2024-05-09T05:00:00 (UCX-Py version 0.38.0; UCX commit f02bc54)

Passes

Scenario numpy-TAG-Core-TCP, expected_bw=2.8GB/s passed with bw 3.4GB/s

Scenario numpy-AM-Core-TCP, expected_bw=2.3GB/s passed with bw 2.52GB/s

Scenario numpy-TAG-Async-TCP, expected_bw=2.8GB/s passed with bw 3.46GB/s

Scenario numpy-AM-Async-TCP, expected_bw=2.3GB/s passed with bw 2.49GB/s

Scenario cupy-TAG-Core-CUDA_IPC_SELF, expected_bw=370.0GB/s passed with bw 372.64GB/s

Scenario cupy-TAG-Core-CUDA_IPC_NV2, expected_bw=48.0GB/s passed with bw 48.43GB/s

Scenario cupy-TAG-Core-CUDA_IPC_NV1, expected_bw=24.0GB/s passed with bw 24.23GB/s

Scenario cupy-AM-Core-CUDA_IPC_SELF, expected_bw=370.0GB/s passed with bw 368.24GB/s

Scenario cupy-AM-Core-CUDA_IPC_NV2, expected_bw=48.0GB/s passed with bw 48.36GB/s

Scenario cupy-AM-Core-CUDA_IPC_NV1, expected_bw=24.0GB/s passed with bw 24.21GB/s

Scenario cupy-TAG-Async-CUDA_IPC_SELF, expected_bw=325.0GB/s passed with bw 333.2GB/s

Scenario cupy-TAG-Async-CUDA_IPC_NV2, expected_bw=48.0GB/s passed with bw 47.52GB/s

Scenario cupy-TAG-Async-CUDA_IPC_NV1, expected_bw=24.0GB/s passed with bw 23.99GB/s

Scenario cupy-AM-Async-CUDA_IPC_SELF, expected_bw=325.0GB/s passed with bw 321.41GB/s

Scenario cupy-AM-Async-CUDA_IPC_NV2, expected_bw=48.0GB/s passed with bw 47.51GB/s

Scenario cupy-AM-Async-CUDA_IPC_NV1, expected_bw=24.0GB/s passed with bw 24.0GB/s