ray-project / ray

Ray is a unified framework for scaling AI and Python applications. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.
https://ray.io
Apache License 2.0
33.13k stars 5.61k forks source link

CI test windows://python/ray/tests:test_implicit_resource is consistently_failing #43849

Closed can-anyscale closed 2 months ago

can-anyscale commented 6 months ago

CI test windows://python/ray/tests:test_implicit_resource is flaky. Recent failures:

DataCaseName-windows://python/ray/tests:test_implicit_resource-END Managed by OSS Test Policy

can-anyscale commented 6 months ago

CI test windows://python/ray/tests:test_implicit_resource is flaky. Recent failures:

DataCaseName-windows://python/ray/tests:test_implicit_resource-END Managed by OSS Test Policy

can-anyscale commented 6 months ago

Test passed on latest run: https://buildkite.com/ray-project/postmerge/builds/3472#018e365f-7be5-4162-949d-9b68da190656

can-anyscale commented 6 months ago

CI test windows://python/ray/tests:test_implicit_resource is flaky. Recent failures:

DataCaseName-windows://python/ray/tests:test_implicit_resource-END Managed by OSS Test Policy

can-anyscale commented 6 months ago

Test passed on latest run: https://buildkite.com/ray-project/postmerge/builds/3580#018e4ff8-4d8b-4a2a-8cc0-ff02516b5248

can-anyscale commented 6 months ago

CI test windows://python/ray/tests:test_implicit_resource is flaky. Recent failures:

DataCaseName-windows://python/ray/tests:test_implicit_resource-END Managed by OSS Test Policy

can-anyscale commented 6 months ago

Test passed on latest run: https://buildkite.com/ray-project/postmerge/builds/3676#018e6340-753a-4637-a037-de1132a8d75b

can-anyscale commented 6 months ago

CI test windows://python/ray/tests:test_implicit_resource is consistently_failing. Recent failures:

DataCaseName-windows://python/ray/tests:test_implicit_resource-END Managed by OSS Test Policy

can-anyscale commented 6 months ago

Test passed on latest run: https://buildkite.com/ray-project/postmerge/builds/3727#018e7b48-5de8-40cf-a8f8-71efe8fd88de

can-anyscale commented 6 months ago

CI test windows://python/ray/tests:test_implicit_resource is flaky. Recent failures:

DataCaseName-windows://python/ray/tests:test_implicit_resource-END Managed by OSS Test Policy

can-anyscale commented 5 months ago

Test passed on latest run: https://buildkite.com/ray-project/postmerge/builds/3788#018e9a2e-c1b8-4716-ba2d-9e506871cd38

can-anyscale commented 5 months ago

CI test windows://python/ray/tests:test_implicit_resource is flaky. Recent failures:

DataCaseName-windows://python/ray/tests:test_implicit_resource-END Managed by OSS Test Policy

can-anyscale commented 5 months ago

Test passed on latest run: https://buildkite.com/ray-project/postmerge/builds/4026#018ee7c3-5f0c-4255-8221-ce496a087c45

can-anyscale commented 3 months ago

CI test windows://python/ray/tests:test_implicit_resource is consistently_failing. Recent failures:

DataCaseName-windows://python/ray/tests:test_implicit_resource-END Managed by OSS Test Policy

mattip commented 3 months ago

Is there a way for me to access the buildkite.com pages? I think I need permissions.

mattip commented 3 months ago

The test does not show many failures on https://flaky-tests.ray.io/

can-anyscale commented 3 months ago

CI test windows://python/ray/tests:test_implicit_resource is consistently_failing. Recent failures:

DataCaseName-windows://python/ray/tests:test_implicit_resource-END Managed by OSS Test Policy

can-anyscale commented 3 months ago

Test passed on latest run: https://buildkite.com/ray-project/postmerge/builds/5074#01903e9c-9e28-48c5-ae5c-5fc161608b50

can-anyscale commented 3 months ago

CI test windows://python/ray/tests:test_implicit_resource is flaky. Recent failures:

DataCaseName-windows://python/ray/tests:test_implicit_resource-END Managed by OSS Test Policy

mattip commented 3 months ago

Is there a way to see the logs? I don't seem to be able to view the build-kite links.

can-anyscale commented 3 months ago

@mattip got you, yes, that pipeline is private; here is the log; or you can run in the a PR, the PR pipeline is public

[2024-06-11T18:14:50Z] ================================================================================
--
  | [2024-06-11T18:14:50Z] ==================== Test output for //python/ray/tests:test_implicit_resource:
  | [2024-06-11T18:14:50Z] ============================= test session starts =============================
  | [2024-06-11T18:14:50Z] platform win32 -- Python 3.9.7, pytest-7.0.1, pluggy-1.3.0 -- C:\Miniconda3\python.exe
  | [2024-06-11T18:14:50Z] cachedir: .pytest_cache
  | [2024-06-11T18:14:50Z] rootdir: C:\Users\ContainerAdministrator\AppData\Local\Temp\Bazel.runfiles_t6unxcsg\runfiles\com_github_ray_project_ray
  | [2024-06-11T18:14:50Z] plugins: anyio-3.7.1, asyncio-0.16.0, docker-tools-3.1.3, forked-1.4.0, httpserver-1.0.6, lazy-fixture-0.6.3, rerunfailures-11.1.2, shutil-1.7.0, sphinx-0.5.1.dev0, sugar-0.9.5, timeout-2.1.0, virtualenv-1.7.0
  | [2024-06-11T18:14:50Z] collecting ... collected 3 items
  | [2024-06-11T18:14:50Z]
  | [2024-06-11T18:14:50Z] python/ray/tests/test_implicit_resource.py::test_implicit_resource 2024-06-11 18:04:55,237   INFO worker.py:1761 -- Started a local Ray instance. View the dashboard at 127.0.0.1:8265
  | Error creating PyTest summary | 0s
  | [2024-06-11T18:14:50Z] [Errno 2] No such file or directory: 'C:/artifact-mount/test-summaries\\python/ray/tests/test_implicit_resource.py$$test_implicit_resource.txt'
  | [2024-06-11T18:14:50Z] FAILED
  | [2024-06-11T18:14:50Z] python/ray/tests/test_implicit_resource.py::test_implicit_resource_autoscaling[v1] 2024-06-11 18:05:02,913 - INFO - NumExpr defaulting to 1 threads.
  | [2024-06-11T18:14:50Z] Did not find any active Ray processes.
  | [2024-06-11T18:14:50Z] 2024-06-11 18:05:05,205 - INFO - NumExpr defaulting to 1 threads.
  | [2024-06-11T18:14:50Z] Usage stats collection is disabled.
  | [2024-06-11T18:14:50Z]
  | [2024-06-11T18:14:50Z] Local node IP: 172.30.241.182
  | [2024-06-11T18:14:50Z]
  | [2024-06-11T18:14:50Z] --------------------
  | [2024-06-11T18:14:50Z] Ray runtime started.
  | [2024-06-11T18:14:50Z] --------------------
  | [2024-06-11T18:14:50Z]
  | [2024-06-11T18:14:50Z] Next steps
  | [2024-06-11T18:14:50Z]   To add another node to this Ray cluster, run
  | [2024-06-11T18:14:50Z]     RAY_ENABLE_WINDOWS_OR_OSX_CLUSTER=1 ray start --address='172.30.241.182:6379'
  | [2024-06-11T18:14:50Z]
  | [2024-06-11T18:14:50Z]   To connect to this Ray cluster:
  | [2024-06-11T18:14:50Z]     import ray
  | [2024-06-11T18:14:50Z]     ray.init()
  | [2024-06-11T18:14:50Z]
  | [2024-06-11T18:14:50Z]   To submit a Ray job using the Ray Jobs CLI:
  | [2024-06-11T18:14:50Z]     RAY_ADDRESS='http://127.0.0.1:8265' ray job submit --working-dir . -- python my_script.py
  | [2024-06-11T18:14:50Z]
  | [2024-06-11T18:14:50Z]   See https://docs.ray.io/en/latest/cluster/running-applications/job-submission/index.html
  | [2024-06-11T18:14:50Z]   for more information on submitting Ray jobs to the Ray cluster.
  | [2024-06-11T18:14:50Z]
  | [2024-06-11T18:14:50Z]   To terminate the Ray runtime, run
  | [2024-06-11T18:14:50Z]     ray stop
  | [2024-06-11T18:14:50Z]
  | [2024-06-11T18:14:50Z]   To view the status of the cluster, use
  | [2024-06-11T18:14:50Z]     ray status
  | [2024-06-11T18:14:50Z]
  | [2024-06-11T18:14:50Z]   To monitor and debug Ray, view the dashboard at
  | [2024-06-11T18:14:50Z]     127.0.0.1:8265
  | [2024-06-11T18:14:50Z]
  | [2024-06-11T18:14:50Z]   If connection to the dashboard fails, check your firewall settings and network configuration.
  | [2024-06-11T18:14:50Z] 2024-06-11 18:05:08,230  INFO worker.py:1585 -- Connecting to existing Ray cluster at address: 172.30.241.182:6379...
  | [2024-06-11T18:14:50Z] 2024-06-11 18:05:08,245  INFO worker.py:1761 -- Connected to Ray cluster. View the dashboard at 127.0.0.1:8265
  | [2024-06-11T18:14:50Z] (raylet) [2024-06-11 18:05:12,027 C 10256 14764] (raylet.exe) dlmalloc.cc:129:  Check failed: *handle != nullptr CreateFileMapping() failed. GetLastError() = 1450
  | [2024-06-11T18:14:50Z] (raylet) *** StackTrace Information ***
  | [2024-06-11T18:14:50Z] (raylet) unknown
  | [2024-06-11T18:14:50Z] (raylet)
  | [2024-06-11T18:14:50Z] (raylet)
  | [2024-06-11T18:14:50Z] (raylet)
  | [2024-06-11T18:14:50Z] (raylet)
  | [2024-06-11T18:14:50Z] (raylet)
  | [2024-06-11T18:14:50Z] (raylet) [2024-06-11 18:06:13,109 C 15980 7336] (raylet.exe) dlmalloc.cc:129:  Check failed: *handle != nullptr CreateFileMapping() failed. GetLastError() = 1450 [repeated 4x across cluster] (Ray deduplicates logs by default. Set RAY_DEDUP_LOGS=0 to disable log deduplication, or see https://docs.ray.io/en/master/ray-observability/user-guides/configure-logging.html#log-deduplication for more options.)
  | [2024-06-11T18:14:50Z] (raylet) *** StackTrace Information *** [repeated 4x across cluster]
  | [2024-06-11T18:14:50Z] (raylet) unknown [repeated 59x across cluster]
  | [2024-06-11T18:14:50Z] (raylet)
  | [2024-06-11T18:14:50Z] (raylet)
  | [2024-06-11T18:14:50Z] (raylet)
  | [2024-06-11T18:14:50Z] (raylet)
  | [2024-06-11T18:14:50Z] (raylet)
  | [2024-06-11T18:14:50Z] (raylet)
  | [2024-06-11T18:14:50Z] (raylet) [2024-06-11 18:05:12,027 C 10256 14764] (raylet.exe) dlmalloc.cc:129:  Check failed: *handle != nullptr CreateFileMapping() failed. GetLastError() = 1450 [repeated 6x across cluster]
  | [2024-06-11T18:14:50Z] (raylet) *** StackTrace Information *** [repeated 6x across cluster]
  | [2024-06-11T18:14:50Z] (raylet) unknown [repeated 72x across cluster]
  | [2024-06-11T18:14:50Z] (raylet)
  | [2024-06-11T18:14:50Z] (raylet)
  | [2024-06-11T18:14:50Z] (raylet)
  | [2024-06-11T18:14:50Z] (raylet)
  | [2024-06-11T18:14:50Z] (raylet)
  | [2024-06-11T18:14:50Z] (raylet)
  | [2024-06-11T18:14:50Z] (raylet)
  | [2024-06-11T18:14:50Z] (raylet)
  | [2024-06-11T18:14:50Z] (raylet)
  | [2024-06-11T18:14:50Z] (raylet)
  | [2024-06-11T18:14:50Z] (raylet)
  | [2024-06-11T18:14:50Z] (raylet) [2024-06-11 18:08:16,404 C 15868 7964] (raylet.exe) dlmalloc.cc:129:  Check failed: *handle != nullptr CreateFileMapping() failed. GetLastError() = 1450 [repeated 11x across cluster]
  | [2024-06-11T18:14:50Z] (raylet) *** StackTrace Information *** [repeated 11x across cluster]
  | [2024-06-11T18:14:50Z] (raylet) unknown [repeated 132x across cluster]
  | [2024-06-11T18:14:50Z] (raylet)
  | [2024-06-11T18:14:50Z] (raylet)
  | [2024-06-11T18:14:50Z] (raylet)
  | [2024-06-11T18:14:50Z] (raylet)
  | [2024-06-11T18:14:50Z] (raylet)
  | [2024-06-11T18:14:50Z] (raylet)
  | [2024-06-11T18:14:50Z] (raylet)
  | [2024-06-11T18:14:50Z] (raylet)
  | [2024-06-11T18:14:50Z] (raylet)
  | [2024-06-11T18:14:50Z] (raylet)
  | [2024-06-11T18:14:50Z] (raylet)
  | [2024-06-11T18:14:50Z] (raylet) [2024-06-11 18:09:17,560 C 6788 16808] (raylet.exe) dlmalloc.cc:129:  Check failed: *handle != nullptr CreateFileMapping() failed. GetLastError() = 1450 [repeated 11x across cluster]
  | [2024-06-11T18:14:50Z] (raylet) *** StackTrace Information *** [repeated 11x across cluster]
  | [2024-06-11T18:14:50Z] (raylet) unknown [repeated 132x across cluster]
  | [2024-06-11T18:14:50Z] (raylet)
  | [2024-06-11T18:14:50Z] (raylet)
  | [2024-06-11T18:14:50Z] (raylet)
  | [2024-06-11T18:14:50Z] (raylet)
  | [2024-06-11T18:14:50Z] (raylet)
  | [2024-06-11T18:14:50Z] (raylet)
  | [2024-06-11T18:14:50Z] (raylet)
  | [2024-06-11T18:14:50Z] (raylet)
  | [2024-06-11T18:14:50Z] (raylet)
  | [2024-06-11T18:14:50Z] (raylet)
  | [2024-06-11T18:14:50Z] (raylet)
  | [2024-06-11T18:14:50Z] (raylet)
  | [2024-06-11T18:14:50Z] ================================================================================
  | [2024-06-11T18:14:50Z] ==================== Test output for //python/ray/tests:test_implicit_resource:
  | [2024-06-11T18:14:50Z] ============================= test session starts =============================
  | [2024-06-11T18:14:50Z] platform win32 -- Python 3.9.7, pytest-7.0.1, pluggy-1.3.0 -- C:\Miniconda3\python.exe
  | [2024-06-11T18:14:50Z] cachedir: .pytest_cache
  | [2024-06-11T18:14:50Z] rootdir: C:\Users\ContainerAdministrator\AppData\Local\Temp\Bazel.runfiles_1yvbyj79\runfiles\com_github_ray_project_ray
  | [2024-06-11T18:14:50Z] plugins: anyio-3.7.1, asyncio-0.16.0, docker-tools-3.1.3, forked-1.4.0, httpserver-1.0.6, lazy-fixture-0.6.3, rerunfailures-11.1.2, shutil-1.7.0, sphinx-0.5.1.dev0, sugar-0.9.5, timeout-2.1.0, virtualenv-1.7.0
  | [2024-06-11T18:14:50Z] collecting ... collected 3 items
  | [2024-06-11T18:14:50Z]
  | [2024-06-11T18:14:50Z] python/ray/tests/test_implicit_resource.py::test_implicit_resource 2024-06-11 18:09:56,404   INFO worker.py:1761 -- Started a local Ray instance. View the dashboard at 127.0.0.1:8265
  | Error creating PyTest summary | 3m 12s
  | [2024-06-11T18:14:50Z] [Errno 2] No such file or directory: 'C:/artifact-mount/test-summaries\\python/ray/tests/test_implicit_resource.py$$test_implicit_resource.txt'
  | [2024-06-11T18:14:50Z] FAILED
  | [2024-06-11T18:14:50Z] python/ray/tests/test_implicit_resource.py::test_implicit_resource_autoscaling[v1] 2024-06-11 18:10:07,693 - INFO - NumExpr defaulting to 1 threads.
  | [2024-06-11T18:14:50Z] Did not find any active Ray processes.
  | [2024-06-11T18:14:50Z] 2024-06-11 18:10:10,307 - INFO - NumExpr defaulting to 1 threads.
  | [2024-06-11T18:14:50Z] Usage stats collection is disabled.
  | [2024-06-11T18:14:50Z]
  | [2024-06-11T18:14:50Z] Local node IP: 172.30.241.182
  | [2024-06-11T18:14:50Z]
  | [2024-06-11T18:14:50Z] --------------------
  | [2024-06-11T18:14:50Z] Ray runtime started.
  | [2024-06-11T18:14:50Z] --------------------
  | [2024-06-11T18:14:50Z]
  | [2024-06-11T18:14:50Z] Next steps
  | [2024-06-11T18:14:50Z]   To add another node to this Ray cluster, run
  | [2024-06-11T18:14:50Z]     RAY_ENABLE_WINDOWS_OR_OSX_CLUSTER=1 ray start --address='172.30.241.182:6379'
  | [2024-06-11T18:14:50Z]
  | [2024-06-11T18:14:50Z]   To connect to this Ray cluster:
  | [2024-06-11T18:14:50Z]     import ray
  | [2024-06-11T18:14:50Z]     ray.init()
  | [2024-06-11T18:14:50Z]
  | [2024-06-11T18:14:50Z]   To submit a Ray job using the Ray Jobs CLI:
  | [2024-06-11T18:14:50Z]     RAY_ADDRESS='http://127.0.0.1:8265' ray job submit --working-dir . -- python my_script.py
  | [2024-06-11T18:14:50Z]
  | [2024-06-11T18:14:50Z]   See https://docs.ray.io/en/latest/cluster/running-applications/job-submission/index.html
  | [2024-06-11T18:14:50Z]   for more information on submitting Ray jobs to the Ray cluster.
  | [2024-06-11T18:14:50Z]
  | [2024-06-11T18:14:50Z]   To terminate the Ray runtime, run
  | [2024-06-11T18:14:50Z]     ray stop
  | [2024-06-11T18:14:50Z]
  | [2024-06-11T18:14:50Z]   To view the status of the cluster, use
  | [2024-06-11T18:14:50Z]     ray status
  | [2024-06-11T18:14:50Z]
  | [2024-06-11T18:14:50Z]   To monitor and debug Ray, view the dashboard at
  | [2024-06-11T18:14:50Z]     127.0.0.1:8265
  | [2024-06-11T18:14:50Z]
  | [2024-06-11T18:14:50Z]   If connection to the dashboard fails, check your firewall settings and network configuration.
  | [2024-06-11T18:14:50Z] 2024-06-11 18:10:13,091  INFO worker.py:1585 -- Connecting to existing Ray cluster at address: 172.30.241.182:6379...
  | [2024-06-11T18:14:50Z] 2024-06-11 18:10:13,107  INFO worker.py:1761 -- Connected to Ray cluster. View the dashboard at 127.0.0.1:8265
  | [2024-06-11T18:14:50Z] (raylet) [2024-06-11 18:10:16,019 C 8976 14668] (raylet.exe) dlmalloc.cc:129:  Check failed: *handle != nullptr CreateFileMapping() failed. GetLastError() = 1450
  | [2024-06-11T18:14:50Z] (raylet) *** StackTrace Information ***
  | [2024-06-11T18:14:50Z] (raylet) unknown
  | [2024-06-11T18:14:50Z] (raylet)
  | [2024-06-11T18:14:50Z] (raylet)
  | [2024-06-11T18:14:50Z] (raylet)
  | [2024-06-11T18:14:50Z] (raylet)
  | [2024-06-11T18:14:50Z] (raylet)
  | [2024-06-11T18:14:50Z] (raylet) [2024-06-11 18:10:16,019 C 8976 14668] (raylet.exe) dlmalloc.cc:129:  Check failed: *handle != nullptr CreateFileMapping() failed. GetLastError() = 1450 [repeated 4x across cluster] (Ray deduplicates logs by default. Set RAY_DEDUP_LOGS=0 to disable log deduplication, or see https://docs.ray.io/en/master/ray-observability/user-guides/configure-logging.html#log-deduplication for more options.)
  | [2024-06-11T18:14:50Z] (raylet) *** StackTrace Information *** [repeated 4x across cluster]
  | [2024-06-11T18:14:50Z] (raylet) unknown [repeated 59x across cluster]
  | [2024-06-11T18:14:50Z] (raylet)
  | [2024-06-11T18:14:50Z] (raylet)
  | [2024-06-11T18:14:50Z] (raylet)
  | [2024-06-11T18:14:50Z] (raylet)
  | [2024-06-11T18:14:50Z] (raylet)
  | [2024-06-11T18:14:50Z] (raylet)
  | [2024-06-11T18:14:50Z] (raylet)
  | [2024-06-11T18:14:50Z] (raylet)
  | [2024-06-11T18:14:50Z] (raylet) [2024-06-11 18:10:16,019 C 8976 14668] (raylet.exe) dlmalloc.cc:129:  Check failed: *handle != nullptr CreateFileMapping() failed. GetLastError() = 1450 [repeated 8x across cluster]
  | [2024-06-11T18:14:50Z] (raylet) *** StackTrace Information *** [repeated 8x across cluster]
  | [2024-06-11T18:14:50Z] (raylet) unknown [repeated 96x across cluster]
  | [2024-06-11T18:14:50Z] (raylet)
  | [2024-06-11T18:14:50Z] 2024-06-11 18:12:21,735 - INFO - NumExpr defaulting to 1 threads.
  | Stopped all 35 Ray processes.opped.
  | [2024-06-11T18:14:50Z] (autoscaler +22s) Tip: use `ray status` to view detailed cluster status. To disable these messages, set RAY_SCHEDULER_EVENTS=0.
  | [2024-06-11T18:14:50Z] (autoscaler +22s) Resized to 0 CPUs.
  | [2024-06-11T18:14:50Z] (autoscaler +22s) Adding 1 node(s) of type cpu_node.
  | [2024-06-11T18:14:50Z] (autoscaler +22s) Resized to 8 CPUs.
  | [2024-06-11T18:14:50Z] (autoscaler +22s) Resized to 0 CPUs.
  | [2024-06-11T18:14:50Z] (autoscaler +22s) Adding 1 node(s) of type cpu_node.
  | [2024-06-11T18:14:50Z] (autoscaler +22s) Resized to 8 CPUs.
  | [2024-06-11T18:14:50Z] (autoscaler +24s) Adding 3 node(s) of type cpu_node.
  | [2024-06-11T18:14:50Z] (autoscaler +24s) Resized to 16 CPUs.
  | [2024-06-11T18:14:50Z] (autoscaler +24s) Adding 3 node(s) of type cpu_node.
  | [2024-06-11T18:14:50Z] (autoscaler +24s) Resized to 16 CPUs.
  | [2024-06-11T18:14:50Z] (autoscaler +24s) Resized to 0 CPUs.
  | [2024-06-11T18:14:50Z] (autoscaler +24s) Adding 1 node(s) of type cpu_node.
  | [2024-06-11T18:14:50Z] (autoscaler +24s) Resized to 8 CPUs.
  | [2024-06-11T18:14:50Z] (autoscaler +24s) Adding 3 node(s) of type cpu_node.
  | [2024-06-11T18:14:50Z] (autoscaler +24s) Resized to 16 CPUs.
  | [2024-06-11T18:14:50Z] (autoscaler +25s) Resized to 0 CPUs.
  | [2024-06-11T18:14:50Z] (autoscaler +25s) Adding 1 node(s) of type cpu_node.
  | [2024-06-11T18:14:50Z] (autoscaler +25s) Resized to 8 CPUs.
  | [2024-06-11T18:14:50Z] (autoscaler +25s) Adding 3 node(s) of type cpu_node.
  | [2024-06-11T18:14:50Z] (autoscaler +25s) Resized to 16 CPUs.
  | [2024-06-11T18:14:50Z] (autoscaler +1m25s) Failed to launch 2 node(s) of type cpu_node.
  | [2024-06-11T18:14:50Z] (autoscaler +1m25s) Failed to launch 2 node(s) of type cpu_node.
  | [2024-06-11T18:14:50Z] (autoscaler +1m25s) Failed to launch 2 node(s) of type cpu_node.
  | [2024-06-11T18:14:50Z] (autoscaler +1m25s) Failed to launch 2 node(s) of type cpu_node.
  | [2024-06-11T18:14:50Z] (autoscaler +1m26s) Resized to 0 CPUs.
  | [2024-06-11T18:14:50Z] (autoscaler +1m26s) Adding 1 node(s) of type cpu_node.
  | [2024-06-11T18:14:50Z] (autoscaler +1m26s) Resized to 8 CPUs.
  | [2024-06-11T18:14:50Z] (autoscaler +1m26s) Adding 3 node(s) of type cpu_node.
  | [2024-06-11T18:14:50Z] (autoscaler +1m26s) Resized to 16 CPUs.
  | [2024-06-11T18:14:50Z] (autoscaler +1m26s) Failed to launch 2 node(s) of type cpu_node.
  | [2024-06-11T18:14:50Z] (autoscaler +1m26s) Adding 2 node(s) of type cpu_node.
  | [2024-06-11T18:14:50Z] (autoscaler +1m26s) Resized to 24 CPUs.
  | [2024-06-11T18:14:50Z] (autoscaler +1m26s) Adding 2 node(s) of type cpu_node.
  | [2024-06-11T18:14:50Z] (autoscaler +1m26s) Resized to 24 CPUs.
  | [2024-06-11T18:14:50Z] (autoscaler +1m26s) Adding 2 node(s) of type cpu_node.
  | [2024-06-11T18:14:50Z] (autoscaler +1m26s) Resized to 24 CPUs.
  | [2024-06-11T18:14:50Z] (autoscaler +1m26s) Adding 2 node(s) of type cpu_node.
  | [2024-06-11T18:14:50Z] (autoscaler +1m26s) Resized to 24 CPUs.
  | [2024-06-11T18:14:50Z] (autoscaler +1m26s) Adding 2 node(s) of type cpu_node.
  | [2024-06-11T18:14:50Z] (autoscaler +1m26s) Resized to 24 CPUs.
  | [2024-06-11T18:14:50Z] (autoscaler +1m27s) Resized to 0 CPUs.
  | [2024-06-11T18:14:50Z] (autoscaler +1m27s) Adding 1 node(s) of type cpu_node.
  | [2024-06-11T18:14:50Z] (autoscaler +1m27s) Resized to 8 CPUs.
  | [2024-06-11T18:14:50Z] (autoscaler +1m27s) Adding 3 node(s) of type cpu_node.
  | [2024-06-11T18:14:50Z] (autoscaler +1m27s) Resized to 16 CPUs.
  | [2024-06-11T18:14:50Z] (autoscaler +1m27s) Failed to launch 2 node(s) of type cpu_node.
  | [2024-06-11T18:14:50Z] (autoscaler +1m27s) Adding 2 node(s) of type cpu_node.
  | [2024-06-11T18:14:50Z] (autoscaler +1m27s) Resized to 24 CPUs.
  | [2024-06-11T18:14:50Z] (autoscaler +2m27s) Failed to launch 1 node(s) of type cpu_node.
  | [2024-06-11T18:14:50Z] (autoscaler +2m27s) Failed to launch 1 node(s) of type cpu_node.
  | [2024-06-11T18:14:50Z] (autoscaler +2m27s) Failed to launch 1 node(s) of type cpu_node.
  | [2024-06-11T18:14:50Z] (autoscaler +2m27s) Failed to launch 1 node(s) of type cpu_node.
  | [2024-06-11T18:14:50Z] (autoscaler +2m27s) Failed to launch 1 node(s) of type cpu_node.
  | [2024-06-11T18:14:50Z] (autoscaler +2m27s) Failed to launch 1 node(s) of type cpu_node.
  | [2024-06-11T18:14:50Z] (autoscaler +2m28s) Resized to 0 CPUs.
  | [2024-06-11T18:14:50Z] (autoscaler +2m28s) Adding 1 node(s) of type cpu_node.
  | [2024-06-11T18:14:50Z] (autoscaler +2m28s) Resized to 8 CPUs.
  | [2024-06-11T18:14:50Z] (autoscaler +2m28s) Adding 3 node(s) of type cpu_node.
  | [2024-06-11T18:14:50Z] (autoscaler +2m28s) Resized to 16 CPUs.
  | [2024-06-11T18:14:50Z] (autoscaler +2m28s) Failed to launch 2 node(s) of type cpu_node.
  | [2024-06-11T18:14:50Z] (autoscaler +2m28s) Adding 2 node(s) of type cpu_node.
  | [2024-06-11T18:14:50Z] (autoscaler +2m28s) Resized to 24 CPUs.
  | [2024-06-11T18:14:50Z] (autoscaler +2m28s) Failed to launch 1 node(s) of type cpu_node.
  | [2024-06-11T18:14:50Z] (autoscaler +2m28s) Adding 1 node(s) of type cpu_node.
  | [2024-06-11T18:14:50Z] (autoscaler +2m28s) Resized to 32 CPUs.
  | [2024-06-11T18:14:50Z] (autoscaler +2m28s) Adding 1 node(s) of type cpu_node.
  | [2024-06-11T18:14:50Z] (autoscaler +2m28s) Resized to 32 CPUs.
  | [2024-06-11T18:14:50Z] (autoscaler +2m28s) Adding 1 node(s) of type cpu_node.
  | [2024-06-11T18:14:50Z] (autoscaler +2m28s) Resized to 32 CPUs.
  | [2024-06-11T18:14:50Z] (autoscaler +2m28s) Adding 1 node(s) of type cpu_node.
  | [2024-06-11T18:14:50Z] (autoscaler +2m28s) Resized to 32 CPUs.
  | [2024-06-11T18:14:50Z] (autoscaler +2m28s) Adding 1 node(s) of type cpu_node.
  | [2024-06-11T18:14:50Z] (autoscaler +2m28s) Resized to 32 CPUs.
  | [2024-06-11T18:14:50Z] (autoscaler +2m28s) Adding 1 node(s) of type cpu_node.
  | [2024-06-11T18:14:50Z] (autoscaler +2m28s) Resized to 32 CPUs.
  | [2024-06-11T18:14:50Z] (autoscaler +2m28s) Adding 1 node(s) of type cpu_node.
  | [2024-06-11T18:14:50Z] (autoscaler +2m28s) Resized to 32 CPUs.
  | [2024-06-11T18:14:50Z] PASSED(raylet) [2024-06-11 18:11:18,181 C 2724 19236] (raylet.exe) dlmalloc.cc:129:  Check failed: *handle != nullptr CreateFileMapping() failed. GetLastError() = 1450
  | [2024-06-11T18:14:50Z] (raylet) *** StackTrace Information ***
  | [2024-06-11T18:14:50Z] (raylet) unknown [repeated 12x across cluster]
  | [2024-06-11T18:14:50Z]
  | [2024-06-11T18:14:50Z] python/ray/tests/test_implicit_resource.py::test_implicit_resource_autoscaling[v2] 2024-06-11 18:12:25,105 - INFO - NumExpr defaulting to 1 threads.
  | [2024-06-11T18:14:50Z] Did not find any active Ray processes.
  | [2024-06-11T18:14:50Z] 2024-06-11 18:12:27,040 - INFO - NumExpr defaulting to 1 threads.
  | [2024-06-11T18:14:50Z] Usage stats collection is disabled.
  | [2024-06-11T18:14:50Z]
  | [2024-06-11T18:14:50Z] Local node IP: 172.30.241.182
  | [2024-06-11T18:14:50Z]
  | [2024-06-11T18:14:50Z] --------------------
  | [2024-06-11T18:14:50Z] Ray runtime started.
  | [2024-06-11T18:14:50Z] --------------------
  | [2024-06-11T18:14:50Z]
  | [2024-06-11T18:14:50Z] Next steps
  | [2024-06-11T18:14:50Z]   To add another node to this Ray cluster, run
  | [2024-06-11T18:14:50Z]     RAY_ENABLE_WINDOWS_OR_OSX_CLUSTER=1 ray start --address='172.30.241.182:6379'
  | [2024-06-11T18:14:50Z]
  | [2024-06-11T18:14:50Z]   To connect to this Ray cluster:
  | [2024-06-11T18:14:50Z]     import ray
  | [2024-06-11T18:14:50Z]     ray.init()
  | [2024-06-11T18:14:50Z]
  | [2024-06-11T18:14:50Z]   To submit a Ray job using the Ray Jobs CLI:
  | [2024-06-11T18:14:50Z]     RAY_ADDRESS='http://127.0.0.1:8265' ray job submit --working-dir . -- python my_script.py
  | [2024-06-11T18:14:50Z]
  | [2024-06-11T18:14:50Z]   See https://docs.ray.io/en/latest/cluster/running-applications/job-submission/index.html
  | [2024-06-11T18:14:50Z]   for more information on submitting Ray jobs to the Ray cluster.
  | [2024-06-11T18:14:50Z]
  | [2024-06-11T18:14:50Z]   To terminate the Ray runtime, run
  | [2024-06-11T18:14:50Z]     ray stop
  | [2024-06-11T18:14:50Z]
  | [2024-06-11T18:14:50Z]   To view the status of the cluster, use
  | [2024-06-11T18:14:50Z]     ray status
  | [2024-06-11T18:14:50Z]
  | [2024-06-11T18:14:50Z]   To monitor and debug Ray, view the dashboard at
  | [2024-06-11T18:14:50Z]     127.0.0.1:8265
  | [2024-06-11T18:14:50Z]
  | [2024-06-11T18:14:50Z]   If connection to the dashboard fails, check your firewall settings and network configuration.
  | [2024-06-11T18:14:50Z] 2024-06-11 18:12:29,768  INFO worker.py:1585 -- Connecting to existing Ray cluster at address: 172.30.241.182:6379...
  | [2024-06-11T18:14:50Z] 2024-06-11 18:12:29,794  INFO worker.py:1761 -- Connected to Ray cluster. View the dashboard at 127.0.0.1:8265
  | [2024-06-11T18:14:50Z] (raylet) [2024-06-11 18:12:31,778 C 7360 11000] (raylet.exe) dlmalloc.cc:129:  Check failed: *handle != nullptr CreateFileMapping() failed. GetLastError() = 1450
  | [2024-06-11T18:14:50Z] (raylet) *** StackTrace Information ***
  | [2024-06-11T18:14:50Z] (raylet) unknown
  | [2024-06-11T18:14:50Z] (raylet)
  | [2024-06-11T18:14:50Z] (raylet)
  | [2024-06-11T18:14:50Z] (raylet)
  | [2024-06-11T18:14:50Z] (raylet)
  | [2024-06-11T18:14:50Z] (raylet) [2024-06-11 18:12:31,778 C 7360 11000] (raylet.exe) dlmalloc.cc:129:  Check failed: *handle != nullptr CreateFileMapping() failed. GetLastError() = 1450 [repeated 3x across cluster]
  | [2024-06-11T18:14:50Z] (raylet) *** StackTrace Information *** [repeated 3x across cluster]
  | [2024-06-11T18:14:50Z] (raylet) unknown [repeated 47x across cluster]
  | [2024-06-11T18:14:50Z] (raylet)
  | [2024-06-11T18:14:50Z] (raylet)
  | [2024-06-11T18:14:50Z] (raylet)
  | [2024-06-11T18:14:50Z] (raylet)
  | [2024-06-11T18:14:50Z] (raylet)
  | [2024-06-11T18:14:50Z] (raylet)
  | [2024-06-11T18:14:50Z] (raylet)
  | [2024-06-11T18:14:50Z] (raylet) [2024-06-11 18:12:31,778 C 7360 11000] (raylet.exe) dlmalloc.cc:129:  Check failed: *handle != nullptr CreateFileMapping() failed. GetLastError() = 1450 [repeated 7x across cluster]
  | [2024-06-11T18:14:50Z] (raylet) *** StackTrace Information *** [repeated 7x across cluster]
  | [2024-06-11T18:14:50Z] (raylet) unknown [repeated 84x across cluster]
  | [2024-06-11T18:14:50Z] (raylet)
  | [2024-06-11T18:14:50Z] ================================================================================
can-anyscale commented 3 months ago

Test passed on latest run: https://buildkite.com/ray-project/postmerge/builds/5096#01904c03-ecf0-42fe-b43c-952bf04803e3

can-anyscale commented 2 months ago

CI test windows://python/ray/tests:test_implicit_resource is consistently_failing. Recent failures:

DataCaseName-windows://python/ray/tests:test_implicit_resource-END Managed by OSS Test Policy

can-anyscale commented 2 months ago

Test passed on latest run: https://buildkite.com/ray-project/postmerge/builds/5177#01905f9a-5184-43aa-b737-3a39357eba9f

can-anyscale commented 2 months ago

CI test windows://python/ray/tests:test_implicit_resource is flaky. Recent failures:

DataCaseName-windows://python/ray/tests:test_implicit_resource-END Managed by OSS Test Policy

can-anyscale commented 2 months ago

Test passed on latest run: https://buildkite.com/ray-project/postmerge/builds/5195#01906f1a-9609-4261-a801-c83ade15ef6e

can-anyscale commented 2 months ago

CI test windows://python/ray/tests:test_implicit_resource is consistently_failing. Recent failures:

DataCaseName-windows://python/ray/tests:test_implicit_resource-END Managed by OSS Test Policy

can-anyscale commented 2 months ago

Test passed on latest run: https://buildkite.com/ray-project/postmerge/builds/5261#01907b12-8389-495b-9744-8c42d5c29607

can-anyscale commented 2 months ago

Blamed commit: d14c95c5442a55f82d2349a446d3738d1b54b736 found by bisect job https://buildkite.com/ray-project/release-tests-bisect/builds/1293

can-anyscale commented 2 months ago

Test passed on latest run: https://buildkite.com/ray-project/postmerge/builds/5268#01907e81-47ea-4ea6-b1b2-4158ac2829e7

can-anyscale commented 2 months ago

CI test windows://python/ray/tests:test_implicit_resource is flaky. Recent failures:

DataCaseName-windows://python/ray/tests:test_implicit_resource-END Managed by OSS Test Policy

can-anyscale commented 2 months ago

Test passed on latest run: https://buildkite.com/ray-project/postmerge/builds/5272#0190842a-4550-4d8a-b6c2-821ef904fca3

can-anyscale commented 2 months ago

CI test windows://python/ray/tests:test_implicit_resource is consistently_failing. Recent failures:

DataCaseName-windows://python/ray/tests:test_implicit_resource-END Managed by OSS Test Policy

can-anyscale commented 2 months ago

Test passed on latest run: https://buildkite.com/ray-project/postmerge/builds/5352#01909b0a-59ce-4617-bcac-aba3566e78c3

can-anyscale commented 2 months ago

CI test windows://python/ray/tests:test_implicit_resource is flaky. Recent failures:

DataCaseName-windows://python/ray/tests:test_implicit_resource-END Managed by OSS Test Policy

can-anyscale commented 2 months ago

Test passed on latest run: https://buildkite.com/ray-project/postmerge/builds/5378#01909f1f-9282-465d-8bbc-d31ab8ab93a3

can-anyscale commented 2 months ago

CI test windows://python/ray/tests:test_implicit_resource is consistently_failing. Recent failures:

DataCaseName-windows://python/ray/tests:test_implicit_resource-END Managed by OSS Test Policy

can-anyscale commented 2 months ago

ehh, has been flaky since forever

can-anyscale commented 2 months ago

This test is now considered as flaky because it has been failing on postmerge for too long. Flaky tests do not run on premerge.

can-anyscale commented 2 months ago

Test passed on latest run: https://buildkite.com/ray-project/postmerge/builds/5393#0190a368-8228-4555-a1dc-3fde7391ed46