ray-project / ray

Ray is an AI compute engine. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.
https://ray.io
Apache License 2.0
33.7k stars 5.73k forks source link

[core] Disk full error logging is verbose #30833

Open stephanie-wang opened 1 year ago

stephanie-wang commented 1 year ago

What happened + What you expected to happen

Currently Ray logs an error every 10s when the disk is very full. But this is not ideal because the disk can stay full for a very long time and we will print infinitely many logs. Ideally we should only log again after a state change (the disk utilization goes down and then back up again).

Versions / Dependencies

3.0dev

Reproduction script

Run a Ray script with >95% disk space.

Issue Severity

None

stephanie-wang commented 1 year ago

cc @scv119.

rkooo567 commented 1 year ago

Should we log this at all? Maybe exception should be sufficient?