Open rkooo567 opened 9 months ago
After digging into the cluster's logs, I found that there are many Deadline Exceeded
errors in dashboard logs. There are no obvious errors or exceptions from GCS server and Raylet. Might be transient network issues.
This is also transient on release tests. No such failures from recent test results.
My suggestion is to downgrade it to P1/P2.
What happened + What you expected to happen
https://console.anyscale-staging.com/o/anyscale-internal/jobs/prodjob_v1ydmgsb1ucvh3xgkb65c1jdyk
Versions / Dependencies
n/a
Reproduction script
master
Issue Severity
None