Open GeneDer opened 1 month ago
This is actually from an Anyscale customer ofOne and is brought up again at https://anyscaleteam.slack.com/archives/C04UR8J709K/p1726705443604199
We are able to download their logs and have some example requests that got into this state
for example request id 85ece69a-4d51-4ab7-ade8-209f3b702ebf
has the below logs
(ray) gene@geneanyscale2023 test_grcp_2 % grep -r "85ece69a-4d51-4ab7-ade8-209f3b702ebf" /tmp/logs/ses_lww6sc8ampkvp1yd664fe74ivk
/tmp/logs/ses_lww6sc8ampkvp1yd664fe74ivk/session_2024-09-16_05-52-32_342420_3262/head-10.0.255.192-i-0ffe4f3c878f14184/worker-52c7b1dab011acf14a5359819fe183f513ecf835e3343f99991f3ba5-01000000-5752.err:INFO 2024-09-18 22:21:50,573 app1_Ingress 5gsucce2 85ece69a-4d51-4ab7-ade8-209f3b702ebf / app.py:71 - Ray health check initiated.
/tmp/logs/ses_lww6sc8ampkvp1yd664fe74ivk/session_2024-09-16_05-52-32_342420_3262/head-10.0.255.192-i-0ffe4f3c878f14184/worker-52c7b1dab011acf14a5359819fe183f513ecf835e3343f99991f3ba5-01000000-5752.err:INFO 2024-09-18 22:21:50,574 app1_Ingress 5gsucce2 85ece69a-4d51-4ab7-ade8-209f3b702ebf / replica.py:408 - __CALL__ OK 2.5ms
/tmp/logs/ses_lww6sc8ampkvp1yd664fe74ivk/session_2024-09-16_05-52-32_342420_3262/head-10.0.255.192-i-0ffe4f3c878f14184/worker-dd8650c4b2b143ce5679fd36aba84da81a726323718f3a7f0d1a5af7-01000000-5033.err:WARNING 2024-09-18 22:18:52,989 proxy 10.0.255.192 85ece69a-4d51-4ab7-ade8-209f3b702ebf / pow_2_scheduler.py:698 - Failed to schedule request after 50 attempts over 44.07s. Retrying. Request ID: 85ece69a-4d51-4ab7-ade8-209f3b702ebf.
/tmp/logs/ses_lww6sc8ampkvp1yd664fe74ivk/session_2024-09-16_05-52-32_342420_3262/head-10.0.255.192-i-0ffe4f3c878f14184/worker-dd8650c4b2b143ce5679fd36aba84da81a726323718f3a7f0d1a5af7-01000000-5033.err:WARNING 2024-09-18 22:19:43,100 proxy 10.0.255.192 85ece69a-4d51-4ab7-ade8-209f3b702ebf / pow_2_scheduler.py:698 - Failed to schedule request after 100 attempts over 94.18s. Retrying. Request ID: 85ece69a-4d51-4ab7-ade8-209f3b702ebf.
/tmp/logs/ses_lww6sc8ampkvp1yd664fe74ivk/session_2024-09-16_05-52-32_342420_3262/head-10.0.255.192-i-0ffe4f3c878f14184/worker-dd8650c4b2b143ce5679fd36aba84da81a726323718f3a7f0d1a5af7-01000000-5033.err:WARNING 2024-09-18 22:20:33,204 proxy 10.0.255.192 85ece69a-4d51-4ab7-ade8-209f3b702ebf / pow_2_scheduler.py:698 - Failed to schedule request after 150 attempts over 144.29s. Retrying. Request ID: 85ece69a-4d51-4ab7-ade8-209f3b702ebf.
/tmp/logs/ses_lww6sc8ampkvp1yd664fe74ivk/session_2024-09-16_05-52-32_342420_3262/head-10.0.255.192-i-0ffe4f3c878f14184/worker-dd8650c4b2b143ce5679fd36aba84da81a726323718f3a7f0d1a5af7-01000000-5033.err:WARNING 2024-09-18 22:21:23,319 proxy 10.0.255.192 85ece69a-4d51-4ab7-ade8-209f3b702ebf / pow_2_scheduler.py:698 - Failed to schedule request after 200 attempts over 194.40s. Retrying. Request ID: 85ece69a-4d51-4ab7-ade8-209f3b702ebf.
/tmp/logs/ses_lww6sc8ampkvp1yd664fe74ivk/session_2024-09-16_05-52-32_342420_3262/head-10.0.255.192-i-0ffe4f3c878f14184/worker-dd8650c4b2b143ce5679fd36aba84da81a726323718f3a7f0d1a5af7-01000000-5033.err:WARNING 2024-09-18 22:22:13,431 proxy 10.0.255.192 85ece69a-4d51-4ab7-ade8-209f3b702ebf / pow_2_scheduler.py:698 - Failed to schedule request after 250 attempts over 244.52s. Retrying. Request ID: 85ece69a-4d51-4ab7-ade8-209f3b702ebf.
/tmp/logs/ses_lww6sc8ampkvp1yd664fe74ivk/session_2024-09-16_05-52-32_342420_3262/head-10.0.255.192-i-0ffe4f3c878f14184/worker-dd8650c4b2b143ce5679fd36aba84da81a726323718f3a7f0d1a5af7-01000000-5033.err:WARNING 2024-09-18 22:23:03,525 proxy 10.0.255.192 85ece69a-4d51-4ab7-ade8-209f3b702ebf / pow_2_scheduler.py:698 - Failed to schedule request after 300 attempts over 294.61s. Retrying. Request ID: 85ece69a-4d51-4ab7-ade8-209f3b702ebf.
/tmp/logs/ses_lww6sc8ampkvp1yd664fe74ivk/session_2024-09-16_05-52-32_342420_3262/head-10.0.255.192-i-0ffe4f3c878f14184/worker-dd8650c4b2b143ce5679fd36aba84da81a726323718f3a7f0d1a5af7-01000000-5033.err:WARNING 2024-09-18 22:23:53,621 proxy 10.0.255.192 85ece69a-4d51-4ab7-ade8-209f3b702ebf / pow_2_scheduler.py:698 - Failed to schedule request after 350 attempts over 344.71s. Retrying. Request ID: 85ece69a-4d51-4ab7-ade8-209f3b702ebf.
/tmp/logs/ses_lww6sc8ampkvp1yd664fe74ivk/session_2024-09-16_05-52-32_342420_3262/head-10.0.255.192-i-0ffe4f3c878f14184/worker-dd8650c4b2b143ce5679fd36aba84da81a726323718f3a7f0d1a5af7-01000000-5033.err:WARNING 2024-09-18 22:24:43,736 proxy 10.0.255.192 85ece69a-4d51-4ab7-ade8-209f3b702ebf / pow_2_scheduler.py:698 - Failed to schedule request after 400 attempts over 394.82s. Retrying. Request ID: 85ece69a-4d51-4ab7-ade8-209f3b702ebf.
/tmp/logs/ses_lww6sc8ampkvp1yd664fe74ivk/session_2024-09-16_05-52-32_342420_3262/head-10.0.255.192-i-0ffe4f3c878f14184/worker-dd8650c4b2b143ce5679fd36aba84da81a726323718f3a7f0d1a5af7-01000000-5033.err:WARNING 2024-09-18 22:25:53,989 proxy 10.0.255.192 85ece69a-4d51-4ab7-ade8-209f3b702ebf / pow_2_scheduler.py:698 - Failed to schedule request after 50 attempts over 44.11s. Retrying. Request ID: 79a8944b-89fe-448b-9293-2cb06e4cc8b1.
/tmp/logs/ses_lww6sc8ampkvp1yd664fe74ivk/session_2024-09-16_05-52-32_342420_3262/head-10.0.255.192-i-0ffe4f3c878f14184/worker-dd8650c4b2b143ce5679fd36aba84da81a726323718f3a7f0d1a5af7-01000000-5033.err:WARNING 2024-09-18 22:26:44,151 proxy 10.0.255.192 85ece69a-4d51-4ab7-ade8-209f3b702ebf / pow_2_scheduler.py:698 - Failed to schedule request after 100 attempts over 94.27s. Retrying. Request ID: 79a8944b-89fe-448b-9293-2cb06e4cc8b1.
/tmp/logs/ses_lww6sc8ampkvp1yd664fe74ivk/session_2024-09-16_05-52-32_342420_3262/head-10.0.255.192-i-0ffe4f3c878f14184/worker-dd8650c4b2b143ce5679fd36aba84da81a726323718f3a7f0d1a5af7-01000000-5033.err:WARNING 2024-09-18 22:27:34,364 proxy 10.0.255.192 85ece69a-4d51-4ab7-ade8-209f3b702ebf / pow_2_scheduler.py:698 - Failed to schedule request after 150 attempts over 144.48s. Retrying. Request ID: 79a8944b-89fe-448b-9293-2cb06e4cc8b1.
/tmp/logs/ses_lww6sc8ampkvp1yd664fe74ivk/session_2024-09-16_05-52-32_342420_3262/head-10.0.255.192-i-0ffe4f3c878f14184/worker-dd8650c4b2b143ce5679fd36aba84da81a726323718f3a7f0d1a5af7-01000000-5033.err:WARNING 2024-09-18 22:28:24,567 proxy 10.0.255.192 85ece69a-4d51-4ab7-ade8-209f3b702ebf / pow_2_scheduler.py:698 - Failed to schedule request after 200 attempts over 194.68s. Retrying. Request ID: 79a8944b-89fe-448b-9293-2cb06e4cc8b1.
/tmp/logs/ses_lww6sc8ampkvp1yd664fe74ivk/session_2024-09-16_05-52-32_342420_3262/head-10.0.255.192-i-0ffe4f3c878f14184/combined-worker.log:{"filename":"worker-dd8650c4b2b143ce5679fd36aba84da81a726323718f3a7f0d1a5af7-01000000-5033.err","message":"WARNING 2024-09-18 22:18:52,989 proxy 10.0.255.192 85ece69a-4d51-4ab7-ade8-209f3b702ebf / pow_2_scheduler.py:698 - Failed to schedule request after 50 attempts over 44.07s. Retrying. Request ID: 85ece69a-4d51-4ab7-ade8-209f3b702ebf."}
/tmp/logs/ses_lww6sc8ampkvp1yd664fe74ivk/session_2024-09-16_05-52-32_342420_3262/head-10.0.255.192-i-0ffe4f3c878f14184/combined-worker.log:{"filename":"worker-dd8650c4b2b143ce5679fd36aba84da81a726323718f3a7f0d1a5af7-01000000-5033.err","message":"WARNING 2024-09-18 22:19:43,100 proxy 10.0.255.192 85ece69a-4d51-4ab7-ade8-209f3b702ebf / pow_2_scheduler.py:698 - Failed to schedule request after 100 attempts over 94.18s. Retrying. Request ID: 85ece69a-4d51-4ab7-ade8-209f3b702ebf."}
/tmp/logs/ses_lww6sc8ampkvp1yd664fe74ivk/session_2024-09-16_05-52-32_342420_3262/head-10.0.255.192-i-0ffe4f3c878f14184/combined-worker.log:{"filename":"worker-dd8650c4b2b143ce5679fd36aba84da81a726323718f3a7f0d1a5af7-01000000-5033.err","message":"WARNING 2024-09-18 22:20:33,204 proxy 10.0.255.192 85ece69a-4d51-4ab7-ade8-209f3b702ebf / pow_2_scheduler.py:698 - Failed to schedule request after 150 attempts over 144.29s. Retrying. Request ID: 85ece69a-4d51-4ab7-ade8-209f3b702ebf."}
/tmp/logs/ses_lww6sc8ampkvp1yd664fe74ivk/session_2024-09-16_05-52-32_342420_3262/head-10.0.255.192-i-0ffe4f3c878f14184/combined-worker.log:{"filename":"worker-dd8650c4b2b143ce5679fd36aba84da81a726323718f3a7f0d1a5af7-01000000-5033.err","message":"WARNING 2024-09-18 22:21:23,319 proxy 10.0.255.192 85ece69a-4d51-4ab7-ade8-209f3b702ebf / pow_2_scheduler.py:698 - Failed to schedule request after 200 attempts over 194.40s. Retrying. Request ID: 85ece69a-4d51-4ab7-ade8-209f3b702ebf."}
/tmp/logs/ses_lww6sc8ampkvp1yd664fe74ivk/session_2024-09-16_05-52-32_342420_3262/head-10.0.255.192-i-0ffe4f3c878f14184/combined-worker.log:{"filename":"worker-52c7b1dab011acf14a5359819fe183f513ecf835e3343f99991f3ba5-01000000-5752.err","message":"INFO 2024-09-18 22:21:50,573 app1_Ingress 5gsucce2 85ece69a-4d51-4ab7-ade8-209f3b702ebf / app.py:71 - Ray health check initiated."}
/tmp/logs/ses_lww6sc8ampkvp1yd664fe74ivk/session_2024-09-16_05-52-32_342420_3262/head-10.0.255.192-i-0ffe4f3c878f14184/combined-worker.log:{"filename":"worker-52c7b1dab011acf14a5359819fe183f513ecf835e3343f99991f3ba5-01000000-5752.err","message":"INFO 2024-09-18 22:21:50,574 app1_Ingress 5gsucce2 85ece69a-4d51-4ab7-ade8-209f3b702ebf / replica.py:408 - __CALL__ OK 2.5ms"}
/tmp/logs/ses_lww6sc8ampkvp1yd664fe74ivk/session_2024-09-16_05-52-32_342420_3262/head-10.0.255.192-i-0ffe4f3c878f14184/combined-worker.log:{"filename":"worker-dd8650c4b2b143ce5679fd36aba84da81a726323718f3a7f0d1a5af7-01000000-5033.err","message":"WARNING 2024-09-18 22:22:13,431 proxy 10.0.255.192 85ece69a-4d51-4ab7-ade8-209f3b702ebf / pow_2_scheduler.py:698 - Failed to schedule request after 250 attempts over 244.52s. Retrying. Request ID: 85ece69a-4d51-4ab7-ade8-209f3b702ebf."}
/tmp/logs/ses_lww6sc8ampkvp1yd664fe74ivk/session_2024-09-16_05-52-32_342420_3262/head-10.0.255.192-i-0ffe4f3c878f14184/combined-worker.log:{"filename":"worker-dd8650c4b2b143ce5679fd36aba84da81a726323718f3a7f0d1a5af7-01000000-5033.err","message":"WARNING 2024-09-18 22:23:03,525 proxy 10.0.255.192 85ece69a-4d51-4ab7-ade8-209f3b702ebf / pow_2_scheduler.py:698 - Failed to schedule request after 300 attempts over 294.61s. Retrying. Request ID: 85ece69a-4d51-4ab7-ade8-209f3b702ebf."}
/tmp/logs/ses_lww6sc8ampkvp1yd664fe74ivk/session_2024-09-16_05-52-32_342420_3262/head-10.0.255.192-i-0ffe4f3c878f14184/combined-worker.log:{"filename":"worker-dd8650c4b2b143ce5679fd36aba84da81a726323718f3a7f0d1a5af7-01000000-5033.err","message":"WARNING 2024-09-18 22:23:53,621 proxy 10.0.255.192 85ece69a-4d51-4ab7-ade8-209f3b702ebf / pow_2_scheduler.py:698 - Failed to schedule request after 350 attempts over 344.71s. Retrying. Request ID: 85ece69a-4d51-4ab7-ade8-209f3b702ebf."}
/tmp/logs/ses_lww6sc8ampkvp1yd664fe74ivk/session_2024-09-16_05-52-32_342420_3262/head-10.0.255.192-i-0ffe4f3c878f14184/combined-worker.log:{"filename":"worker-dd8650c4b2b143ce5679fd36aba84da81a726323718f3a7f0d1a5af7-01000000-5033.err","message":"WARNING 2024-09-18 22:24:43,736 proxy 10.0.255.192 85ece69a-4d51-4ab7-ade8-209f3b702ebf / pow_2_scheduler.py:698 - Failed to schedule request after 400 attempts over 394.82s. Retrying. Request ID: 85ece69a-4d51-4ab7-ade8-209f3b702ebf."}
/tmp/logs/ses_lww6sc8ampkvp1yd664fe74ivk/session_2024-09-16_05-52-32_342420_3262/head-10.0.255.192-i-0ffe4f3c878f14184/combined-worker.log:{"filename":"worker-dd8650c4b2b143ce5679fd36aba84da81a726323718f3a7f0d1a5af7-01000000-5033.err","message":"WARNING 2024-09-18 22:25:53,989 proxy 10.0.255.192 85ece69a-4d51-4ab7-ade8-209f3b702ebf / pow_2_scheduler.py:698 - Failed to schedule request after 50 attempts over 44.11s. Retrying. Request ID: 79a8944b-89fe-448b-9293-2cb06e4cc8b1."}
/tmp/logs/ses_lww6sc8ampkvp1yd664fe74ivk/session_2024-09-16_05-52-32_342420_3262/head-10.0.255.192-i-0ffe4f3c878f14184/combined-worker.log:{"filename":"worker-dd8650c4b2b143ce5679fd36aba84da81a726323718f3a7f0d1a5af7-01000000-5033.err","message":"WARNING 2024-09-18 22:26:44,151 proxy 10.0.255.192 85ece69a-4d51-4ab7-ade8-209f3b702ebf / pow_2_scheduler.py:698 - Failed to schedule request after 100 attempts over 94.27s. Retrying. Request ID: 79a8944b-89fe-448b-9293-2cb06e4cc8b1."}
/tmp/logs/ses_lww6sc8ampkvp1yd664fe74ivk/session_2024-09-16_05-52-32_342420_3262/head-10.0.255.192-i-0ffe4f3c878f14184/combined-worker.log:{"filename":"worker-dd8650c4b2b143ce5679fd36aba84da81a726323718f3a7f0d1a5af7-01000000-5033.err","message":"WARNING 2024-09-18 22:27:34,364 proxy 10.0.255.192 85ece69a-4d51-4ab7-ade8-209f3b702ebf / pow_2_scheduler.py:698 - Failed to schedule request after 150 attempts over 144.48s. Retrying. Request ID: 79a8944b-89fe-448b-9293-2cb06e4cc8b1."}
/tmp/logs/ses_lww6sc8ampkvp1yd664fe74ivk/session_2024-09-16_05-52-32_342420_3262/head-10.0.255.192-i-0ffe4f3c878f14184/combined-worker.log:{"filename":"worker-dd8650c4b2b143ce5679fd36aba84da81a726323718f3a7f0d1a5af7-01000000-5033.err","message":"WARNING 2024-09-18 22:28:24,567 proxy 10.0.255.192 85ece69a-4d51-4ab7-ade8-209f3b702ebf / pow_2_scheduler.py:698 - Failed to schedule request after 200 attempts over 194.68s. Retrying. Request ID: 79a8944b-89fe-448b-9293-2cb06e4cc8b1."}
/tmp/logs/ses_lww6sc8ampkvp1yd664fe74ivk/session_2024-09-16_05-52-32_342420_3262/head-10.0.255.192-i-0ffe4f3c878f14184/serve/replica_app1_Ingress_5gsucce2.log:{"asctime": "2024-09-18 22:21:50,392", "levelname": "WARNING", "message": "Replica at capacity of max_ongoing_requests=5, rejecting request 85ece69a-4d51-4ab7-ade8-209f3b702ebf.", "filename": "replica.py", "lineno": 556, "log_to_stderr": false, "worker_id": "52c7b1dab011acf14a5359819fe183f513ecf835e3343f99991f3ba5", "node_id": "b780a6ab26f3c037b81dbbb86b3b957f042d083e4a8d834cb125318d", "actor_id": "392919b96895babf9e9cc8fa01000000", "deployment": "app1_Ingress", "replica": "5gsucce2", "component_name": "replica"}
/tmp/logs/ses_lww6sc8ampkvp1yd664fe74ivk/session_2024-09-16_05-52-32_342420_3262/head-10.0.255.192-i-0ffe4f3c878f14184/serve/replica_app1_Ingress_5gsucce2.log:{"asctime": "2024-09-18 22:21:50,572", "levelname": "INFO", "message": "Started executing request to method '__call__'.", "filename": "replica.py", "lineno": 1152, "log_to_stderr": false, "worker_id": "52c7b1dab011acf14a5359819fe183f513ecf835e3343f99991f3ba5", "node_id": "b780a6ab26f3c037b81dbbb86b3b957f042d083e4a8d834cb125318d", "actor_id": "392919b96895babf9e9cc8fa01000000", "route": "/", "request_id": "85ece69a-4d51-4ab7-ade8-209f3b702ebf", "application": "app1", "deployment": "app1_Ingress", "replica": "5gsucce2", "component_name": "replica"}
/tmp/logs/ses_lww6sc8ampkvp1yd664fe74ivk/session_2024-09-16_05-52-32_342420_3262/head-10.0.255.192-i-0ffe4f3c878f14184/serve/replica_app1_Ingress_5gsucce2.log:{"asctime": "2024-09-18 22:21:50,573", "levelname": "INFO", "message": "Ray health check initiated.", "filename": "app.py", "lineno": 71, "route": "/", "request_id": "85ece69a-4d51-4ab7-ade8-209f3b702ebf", "application": "app1", "worker_id": "52c7b1dab011acf14a5359819fe183f513ecf835e3343f99991f3ba5", "node_id": "b780a6ab26f3c037b81dbbb86b3b957f042d083e4a8d834cb125318d", "actor_id": "392919b96895babf9e9cc8fa01000000", "deployment": "app1_Ingress", "replica": "5gsucce2", "component_name": "replica"}
/tmp/logs/ses_lww6sc8ampkvp1yd664fe74ivk/session_2024-09-16_05-52-32_342420_3262/head-10.0.255.192-i-0ffe4f3c878f14184/serve/replica_app1_Ingress_5gsucce2.log:{"asctime": "2024-09-18 22:21:50,574", "levelname": "INFO", "message": "__CALL__ OK 2.5ms", "filename": "replica.py", "lineno": 408, "route": "/", "request_id": "85ece69a-4d51-4ab7-ade8-209f3b702ebf", "application": "app1", "worker_id": "52c7b1dab011acf14a5359819fe183f513ecf835e3343f99991f3ba5", "node_id": "b780a6ab26f3c037b81dbbb86b3b957f042d083e4a8d834cb125318d", "actor_id": "392919b96895babf9e9cc8fa01000000", "deployment": "app1_Ingress", "replica": "5gsucce2", "component_name": "replica"}
/tmp/logs/ses_lww6sc8ampkvp1yd664fe74ivk/session_2024-09-16_05-52-32_342420_3262/head-10.0.255.192-i-0ffe4f3c878f14184/serve/proxy_10.0.255.192.log:WARNING 2024-09-18 22:18:52,989 proxy 10.0.255.192 85ece69a-4d51-4ab7-ade8-209f3b702ebf / pow_2_scheduler.py:698 - Failed to schedule request after 50 attempts over 44.07s. Retrying. Request ID: 85ece69a-4d51-4ab7-ade8-209f3b702ebf.
/tmp/logs/ses_lww6sc8ampkvp1yd664fe74ivk/session_2024-09-16_05-52-32_342420_3262/head-10.0.255.192-i-0ffe4f3c878f14184/serve/proxy_10.0.255.192.log:WARNING 2024-09-18 22:19:43,100 proxy 10.0.255.192 85ece69a-4d51-4ab7-ade8-209f3b702ebf / pow_2_scheduler.py:698 - Failed to schedule request after 100 attempts over 94.18s. Retrying. Request ID: 85ece69a-4d51-4ab7-ade8-209f3b702ebf.
/tmp/logs/ses_lww6sc8ampkvp1yd664fe74ivk/session_2024-09-16_05-52-32_342420_3262/head-10.0.255.192-i-0ffe4f3c878f14184/serve/proxy_10.0.255.192.log:WARNING 2024-09-18 22:20:33,204 proxy 10.0.255.192 85ece69a-4d51-4ab7-ade8-209f3b702ebf / pow_2_scheduler.py:698 - Failed to schedule request after 150 attempts over 144.29s. Retrying. Request ID: 85ece69a-4d51-4ab7-ade8-209f3b702ebf.
/tmp/logs/ses_lww6sc8ampkvp1yd664fe74ivk/session_2024-09-16_05-52-32_342420_3262/head-10.0.255.192-i-0ffe4f3c878f14184/serve/proxy_10.0.255.192.log:WARNING 2024-09-18 22:21:23,319 proxy 10.0.255.192 85ece69a-4d51-4ab7-ade8-209f3b702ebf / pow_2_scheduler.py:698 - Failed to schedule request after 200 attempts over 194.40s. Retrying. Request ID: 85ece69a-4d51-4ab7-ade8-209f3b702ebf.
/tmp/logs/ses_lww6sc8ampkvp1yd664fe74ivk/session_2024-09-16_05-52-32_342420_3262/head-10.0.255.192-i-0ffe4f3c878f14184/serve/proxy_10.0.255.192.log:WARNING 2024-09-18 22:22:13,431 proxy 10.0.255.192 85ece69a-4d51-4ab7-ade8-209f3b702ebf / pow_2_scheduler.py:698 - Failed to schedule request after 250 attempts over 244.52s. Retrying. Request ID: 85ece69a-4d51-4ab7-ade8-209f3b702ebf.
/tmp/logs/ses_lww6sc8ampkvp1yd664fe74ivk/session_2024-09-16_05-52-32_342420_3262/head-10.0.255.192-i-0ffe4f3c878f14184/serve/proxy_10.0.255.192.log:WARNING 2024-09-18 22:23:03,525 proxy 10.0.255.192 85ece69a-4d51-4ab7-ade8-209f3b702ebf / pow_2_scheduler.py:698 - Failed to schedule request after 300 attempts over 294.61s. Retrying. Request ID: 85ece69a-4d51-4ab7-ade8-209f3b702ebf.
/tmp/logs/ses_lww6sc8ampkvp1yd664fe74ivk/session_2024-09-16_05-52-32_342420_3262/head-10.0.255.192-i-0ffe4f3c878f14184/serve/proxy_10.0.255.192.log:WARNING 2024-09-18 22:23:53,621 proxy 10.0.255.192 85ece69a-4d51-4ab7-ade8-209f3b702ebf / pow_2_scheduler.py:698 - Failed to schedule request after 350 attempts over 344.71s. Retrying. Request ID: 85ece69a-4d51-4ab7-ade8-209f3b702ebf.
/tmp/logs/ses_lww6sc8ampkvp1yd664fe74ivk/session_2024-09-16_05-52-32_342420_3262/head-10.0.255.192-i-0ffe4f3c878f14184/serve/proxy_10.0.255.192.log:WARNING 2024-09-18 22:24:43,736 proxy 10.0.255.192 85ece69a-4d51-4ab7-ade8-209f3b702ebf / pow_2_scheduler.py:698 - Failed to schedule request after 400 attempts over 394.82s. Retrying. Request ID: 85ece69a-4d51-4ab7-ade8-209f3b702ebf.
/tmp/logs/ses_lww6sc8ampkvp1yd664fe74ivk/session_2024-09-16_05-52-32_342420_3262/head-10.0.255.192-i-0ffe4f3c878f14184/serve/proxy_10.0.255.192.log:WARNING 2024-09-18 22:25:53,989 proxy 10.0.255.192 85ece69a-4d51-4ab7-ade8-209f3b702ebf / pow_2_scheduler.py:698 - Failed to schedule request after 50 attempts over 44.11s. Retrying. Request ID: 79a8944b-89fe-448b-9293-2cb06e4cc8b1.
/tmp/logs/ses_lww6sc8ampkvp1yd664fe74ivk/session_2024-09-16_05-52-32_342420_3262/head-10.0.255.192-i-0ffe4f3c878f14184/serve/proxy_10.0.255.192.log:WARNING 2024-09-18 22:26:44,151 proxy 10.0.255.192 85ece69a-4d51-4ab7-ade8-209f3b702ebf / pow_2_scheduler.py:698 - Failed to schedule request after 100 attempts over 94.27s. Retrying. Request ID: 79a8944b-89fe-448b-9293-2cb06e4cc8b1.
/tmp/logs/ses_lww6sc8ampkvp1yd664fe74ivk/session_2024-09-16_05-52-32_342420_3262/head-10.0.255.192-i-0ffe4f3c878f14184/serve/proxy_10.0.255.192.log:WARNING 2024-09-18 22:27:34,364 proxy 10.0.255.192 85ece69a-4d51-4ab7-ade8-209f3b702ebf / pow_2_scheduler.py:698 - Failed to schedule request after 150 attempts over 144.48s. Retrying. Request ID: 79a8944b-89fe-448b-9293-2cb06e4cc8b1.
/tmp/logs/ses_lww6sc8ampkvp1yd664fe74ivk/session_2024-09-16_05-52-32_342420_3262/head-10.0.255.192-i-0ffe4f3c878f14184/serve/proxy_10.0.255.192.log:WARNING 2024-09-18 22:28:24,567 proxy 10.0.255.192 85ece69a-4d51-4ab7-ade8-209f3b702ebf / pow_2_scheduler.py:698 - Failed to schedule request after 200 attempts over 194.68s. Retrying. Request ID: 79a8944b-89fe-448b-9293-2cb06e4cc8b1.
(ray) gene@geneanyscale2023 test_grcp_2 %
Note that the OK is logged at 2024-09-18 22:21:50,574
/tmp/logs/ses_lww6sc8ampkvp1yd664fe74ivk/session_2024-09-16_05-52-32_342420_3262/head-10.0.255.192-i-0ffe4f3c878f14184/worker-52c7b1dab011acf14a5359819fe183f513ecf835e3343f99991f3ba5-01000000-5752.err:INFO 2024-09-18 22:21:50,574 app1_Ingress 5gsucce2 85ece69a-4d51-4ab7-ade8-209f3b702ebf / replica.py:408 - __CALL__ OK 2.5ms
but at 2024-09-18 22:22:13,431 proxy keeps retrying and failed to schedule the request
/tmp/logs/ses_lww6sc8ampkvp1yd664fe74ivk/session_2024-09-16_05-52-32_342420_3262/head-10.0.255.192-i-0ffe4f3c878f14184/worker-dd8650c4b2b143ce5679fd36aba84da81a726323718f3a7f0d1a5af7-01000000-5033.err:WARNING 2024-09-18 22:22:13,431 proxy 10.0.255.192 85ece69a-4d51-4ab7-ade8-209f3b702ebf / pow_2_scheduler.py:698 - Failed to schedule request after 250 attempts over 244.52s. Retrying. Request ID: 85ece69a-4d51-4ab7-ade8-209f3b702ebf.
Still don't have a good way to reproduce, but seems like a race condition between the scheduler and the replica
This seems to be reproducible by this deployment setup
import asyncio
import logging
from fastapi import FastAPI
from ray import serve
from ray.serve.handle import DeploymentHandle
webapp = FastAPI()
logger = logging.getLogger(__name__)
@serve.deployment(
autoscaling_config={
"max_ongoing_requests": 5,
"min_replicas": 3,
"max_replicas": 3,
},
)
@serve.ingress(webapp)
class Ingress:
def __init__(self, pipeline: DeploymentHandle):
self.pipeline = pipeline
@webapp.get("/")
async def health_check(self) -> str:
logger.info("Ray health check initiated.")
return "Healthy!"
@webapp.post("/conversations")
async def start_conversation(self):
logger.info("in start_conversation before sleep")
await asyncio.gather(
self.pipeline.start.remote(),
)
logger.info("in start_conversation after sleep")
@serve.deployment
class Pipeline:
async def start(self):
await asyncio.sleep(310)
app = Ingress.bind(Pipeline.bind())
Deploy to Anyscale service with anyscale service deploy --name gene-test-timeout2 deployment:app
Then use the following code to call the service
import requests
import ray
@ray.remote
def query_get(url, token):
headers = {"Authorization": f"Bearer {token}"}
response = requests.get(url, headers=headers)
print(response.text)
@ray.remote
def query_post(url, token):
headers = {"Authorization": f"Bearer {token}"}
response = requests.post(url, headers=headers)
print(response.text)
service_url = "https://gene-test-timeout2-bxauk.cld-kvedzwag2qa8i5bj.s.anyscaleuserdata.com/conversations"
token = "ae_1me8Z5Mbm-hJ4eFEyzB01S33cvNQd4OmR6jtkrQE"
ray.get([query_post.remote(service_url, token) for _ in range(11)])
So there are two issues:
asyncio.CancelledError
isn't sent directly to the router (which isn't guaranteed to happen), then the request won't get cancelled at all (and a request could still get forwarded to the downstream deployment). For this we will have to implement full recursive cancellation in the Ray Serve layer.
https://ray.slack.com/archives/CNCKBBRJL/p1724281728601689?thread_ts=1723916926.382399&cid=CNCKBBRJL