ray-project / ray

Ray is a unified framework for scaling AI and Python applications. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.
https://ray.io
Apache License 2.0
33.24k stars 5.62k forks source link

[Data] Provide a timeout value for map_batch call in ray data. #45661

Open gilvikra opened 4 months ago

gilvikra commented 4 months ago

Description

I would like a provision for setting a timeout value in map batches call to guard against the scenario where a batch of data takes too long to process or the map_batch call is just stuck forever. I would like the program to continue with a warning, otherwise resuming a broken job will be lot of headache, like filtering out data already processed, taking out the errored one, and moving the unprocessed one somewhere else for a new execution

Use case

No response

amogkam commented 3 months ago

Can you add the timeout logic in the UDF itself?

gilvikra commented 3 months ago

Sure! But it will be lot cleaner and easier if the ray data-infra provides this functionality

salaki commented 1 month ago

Can you add the timeout logic in the UDF itself?

How to do that if I replied on Ray to resume the failed node? I think this should be a import feature instead of P2.