ray-project / ray

Ray is an AI compute engine. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.
https://ray.io
Apache License 2.0
33.71k stars 5.73k forks source link

CI test darwin://python/ray/tests:test_gcs_fault_tolerance is consistently_failing #43777

Closed can-anyscale closed 2 months ago

can-anyscale commented 8 months ago

CI test darwin://python/ray/tests:test_gcs_fault_tolerance is flaky. Recent failures:

DataCaseName-darwin://python/ray/tests:test_gcs_fault_tolerance-END Managed by OSS Test Policy

fishbone commented 7 months ago


10806:C 05 Mar 2024 15:22:05.412 # WARNING: Changing databases number from 16 to 1 since we are in cluster mode
  | 2024-03-05 07:34:07 PDT | 10806:M 05 Mar 2024 15:22:05.413 # WARNING: The TCP backlog setting of 511 cannot be enforced because kern.ipc.somaxconn is set to the lower value of 128.
  | 2024-03-05 07:34:07 PDT | 10806:M 05 Mar 2024 15:22:05.487 # Warning: Could not create server TCP listening socket ::*:49160: bind: Address already in use
  | 2024-03-05 07:34:07 PDT | 10806:M 05 Mar 2024 15:22:05.487 # Failed listening on port 49160 (tcp), aborting.
 
....

Waiting for redis to be up Error 61 connecting to localhost:49160. Connection refused.
  | 2024-03-05 07:34:07 PDT | Waiting for redis to be up Error 61 connecting to localhost:49160. Connection refused.
  | 2024-03-05 07:34:07 PDT | Waiting for redis to be up Error 61 connecting to localhost:49160. Connection refused.
  | 2024-03-05 07:34:07 PDT | Waiting for redis to be up Error 61 connecting to localhost:49160. Connection refused.
  | 2024-03-05 07:34:07 PDT | Waiting for redis to be up Error 61 connecting to localhost:49160. Connection refused.
  | 2024-03-05 07:34:07 PDT | Waiting for redis to be up Error 61 connecting to localhost:49160. Connection refused.
  | 2024-03-05 07:34:07 PDT | Waiting for redis to be up Error 61 connecting to localhost:49160. Connection refused.
  | 2024-03-05 07:34:07 PDT | Waiting for redis to be up Error 61 connecting to localhost:49160. Connection refused.
  | 2024-03-05 07:34:07 PDT | Waiting for redis to be up Error 61 connecting to localhost:49160. Connection refused.
  | 2024-03-05 07:34:07 PDT | Waiting for redis to be up Error 61 connecting to localhost:49160. Connection refused.
  | 2024-03-05 07:34:07 PDT | Waiting for redis to be up Error 61 connecting to localhost:49160. Connection refused.
 ```

Seems due to environment not being cleaned up.
can-anyscale commented 7 months ago

Test passed on latest run: https://buildkite.com/ray-project/postmerge-macos/builds/135#018e3897-31ee-4c0e-bce8-1409e40fc996

can-anyscale commented 7 months ago

Test passed on latest run: https://buildkite.com/ray-project/postmerge-macos/builds/138#018e3930-43b7-4e57-8416-60cd3c9cd236

can-anyscale commented 6 months ago

CI test darwin://python/ray/tests:test_gcs_fault_tolerance is consistently_failing. Recent failures:

DataCaseName-darwin://python/ray/tests:test_gcs_fault_tolerance-END Managed by OSS Test Policy

can-anyscale commented 6 months ago

Test passed on latest run: https://buildkite.com/ray-project/postmerge-macos/builds/585#018ef240-52cc-418b-b62f-5711ab295a94

can-anyscale commented 5 months ago

CI test darwin://python/ray/tests:test_gcs_fault_tolerance is consistently_failing. Recent failures:

DataCaseName-darwin://python/ray/tests:test_gcs_fault_tolerance-END Managed by OSS Test Policy

can-anyscale commented 5 months ago

Blamed commit: f9ac0505b41fcaaa38f134129d5bc1e7eee0a4e0 found by bisect job https://buildkite.com/ray-project/release-tests-bisect/builds/1173

can-anyscale commented 5 months ago

Test passed on latest run: https://buildkite.com/ray-project/postmerge-macos/builds/950#018fae2e-9f58-4770-b78c-c00da4daf2ff

can-anyscale commented 5 months ago

CI test darwin://python/ray/tests:test_gcs_fault_tolerance is consistently_failing. Recent failures:

DataCaseName-darwin://python/ray/tests:test_gcs_fault_tolerance-END Managed by OSS Test Policy

can-anyscale commented 5 months ago

Blamed commit: f13d144d860a9b76957d08f57468f011b39734a4 found by bisect job https://buildkite.com/ray-project/release-tests-bisect/builds/1184

can-anyscale commented 5 months ago

Test passed on latest run: https://buildkite.com/ray-project/postmerge-macos/builds/955#018fb490-af5d-43c8-b332-343ac4aa2083

can-anyscale commented 5 months ago

CI test darwin://python/ray/tests:test_gcs_fault_tolerance is consistently_failing. Recent failures:

DataCaseName-darwin://python/ray/tests:test_gcs_fault_tolerance-END Managed by OSS Test Policy

can-anyscale commented 5 months ago

Blamed commit: 15c294ed6ca3bfd8dcda8f958c354abc9c28295f found by bisect job https://buildkite.com/ray-project/release-tests-bisect/builds/1187

can-anyscale commented 5 months ago

Test passed on latest run: https://buildkite.com/ray-project/postmerge-macos/builds/959#018fb9b6-d944-498d-9379-7846a16088ee

can-anyscale commented 5 months ago

CI test darwin://python/ray/tests:test_gcs_fault_tolerance is flaky. Recent failures:

DataCaseName-darwin://python/ray/tests:test_gcs_fault_tolerance-END Managed by OSS Test Policy

can-anyscale commented 5 months ago

Test passed on latest run: https://buildkite.com/ray-project/postmerge-macos/builds/964#018fbda1-16a0-47d7-99e6-807863425594

can-anyscale commented 5 months ago

CI test darwin://python/ray/tests:test_gcs_fault_tolerance is consistently_failing. Recent failures:

DataCaseName-darwin://python/ray/tests:test_gcs_fault_tolerance-END Managed by OSS Test Policy

can-anyscale commented 5 months ago

Test passed on latest run: https://buildkite.com/ray-project/postmerge-macos/builds/1044#018fdf29-72fb-4814-88f1-9c9538e6f40e

can-anyscale commented 4 months ago

CI test darwin://python/ray/tests:test_gcs_fault_tolerance is flaky. Recent failures:

DataCaseName-darwin://python/ray/tests:test_gcs_fault_tolerance-END Managed by OSS Test Policy

can-anyscale commented 4 months ago

Test passed on latest run: https://buildkite.com/ray-project/postmerge-macos/builds/1187#01900e18-679b-40c9-bb30-4998c2e2d327

can-anyscale commented 3 months ago

CI test darwin://python/ray/tests:test_gcs_fault_tolerance is consistently_failing. Recent failures:

DataCaseName-darwin://python/ray/tests:test_gcs_fault_tolerance-END Managed by OSS Test Policy

can-anyscale commented 3 months ago

Blamed commit: d9b91edba78020afafd8e7850aba6e4a186ea7b7 found by bisect job https://buildkite.com/ray-project/release-tests-bisect/builds/1346

can-anyscale commented 3 months ago

Test passed on latest run: https://buildkite.com/ray-project/postmerge-macos/builds/1633#0190b9ef-e016-4bba-9a80-a39113e02300

can-anyscale commented 2 months ago

CI test darwin://python/ray/tests:test_gcs_fault_tolerance is consistently_failing. Recent failures:

DataCaseName-darwin://python/ray/tests:test_gcs_fault_tolerance-END Managed by OSS Test Policy

can-anyscale commented 2 months ago

passing now