ray-project / ray

Ray is an AI compute engine. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.
https://ray.io
Apache License 2.0
34.2k stars 5.81k forks source link

Suppress logs when failure in object_spilling #14920

Closed fishbone closed 2 years ago

fishbone commented 3 years ago

Describe your feature request

There are too many lines of the same log showing when something failed. It's not useful and also prevent people from finding the root cause. Maybe we want to suppress it to reduce the number of logs when something broken.

(raylet) [2021-03-24 23:30:19,731 E 29966 29966] local_object_manager.cc:402: Failed to send restore spilled object request: IOError: 14: failed to connect to all addresses
(raylet) [2021-03-24 23:30:19,731 E 29966 29966] local_object_manager.cc:402: Failed to send restore spilled object request: IOError: 14: failed to connect to all addresses
(raylet) [2021-03-24 23:30:19,731 E 29966 29966] local_object_manager.cc:402: Failed to send restore spilled object request: IOError: 14: failed to connect to all addresses
(raylet) [2021-03-24 23:30:19,731 E 29966 29966] local_object_manager.cc:402: Failed to send restore spilled object request: IOError: 14: failed to connect to all addresses
(raylet) [2021-03-24 23:30:19,731 E 29966 29966] local_object_manager.cc:402: Failed to send restore spilled object request: IOError: 14: failed to connect to all addresses
(raylet) [2021-03-24 23:30:19,824 E 29966 29966] local_object_manager.cc:402: Failed to send restore spilled object request: IOError: 14: failed to connect to all addresses
(raylet) [2021-03-24 23:30:19,824 E 29966 29966] local_object_manager.cc:40
The job exceeded the maximum log length, and has been terminated.

GLOG has LOG_EVERY_N which we might want to add to RAY_LOG. (http://rpg.ifi.uzh.ch/docs/glog.html)

rkooo567 commented 2 years ago

Fixed