ray-project / ray

Ray is a unified framework for scaling AI and Python applications. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.
https://ray.io
Apache License 2.0
32.93k stars 5.57k forks source link

[data] Bad error message when function outputs cannot be pickled #46642

Open ericl opened 1 month ago

ericl commented 1 month ago

What happened + What you expected to happen

Running the following example gives a confusing error message that does not tell the user how to fix the problem:

    block_iterator, stats, executor = ds._plan.execute_to_iterator()
  File "/Users/ekl/Library/Python/3.9/lib/python/site-packages/ray/data/exceptions.py", line 86, in handle_trace
    raise e.with_traceback(None) from SystemException()
ray.exceptions.RayTaskError(TypeError): ray::MapBatches(f)() (pid=56603, ip=127.0.0.1)
  File "/Users/ekl/Library/Python/3.9/lib/python/site-packages/ray/cloudpickle/cloudpickle.py", line 1479, in dumps
    cp.dump(obj)
  File "/Users/ekl/Library/Python/3.9/lib/python/site-packages/ray/cloudpickle/cloudpickle.py", line 1245, in dump
    return super().dump(obj)
  File "/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.9/lib/python3.9/socket.py", line 273, in __getstate__
    raise TypeError(f"cannot pickle {self.__class__.__name__!r} object")
TypeError: cannot pickle 'socket' object

The expected error message is something more like this:

Checking Serializability of <ray.data._internal.execution.operators.map_transformer.MapTransformer object at 0x16fffa040>
================================================================================
!!! FAIL serialization: cannot pickle 'socket' object

cc @c21 @raulchen

Versions / Dependencies

Ray 2.32

Reproduction script

import ray
import socket

def f(x):
    return {"x": [socket.socket()]}

ds = ray.data.from_items([1,2,3])
ds = ds.map_batches(f)
ds.show()

Issue Severity

Medium: It is a significant difficulty but I can work around it.

Oblynx commented 1 month ago

Struggling a lot with Ray serialization, I feel that the proposed error message is still not clear enough, because it doesn't refer to the user code that sources the offending object.

Is it possible with some extra inspection to come up with an error message like the following?

This is the log of the execution plan, indicating the user functions by name:

Execution plan of Dataset: InputDataBuffer[Input] -> TaskPoolMapOperator[MapBatches(read_dataset)] -> TaskPoolMapOperator[Map(merge_components)->Map(remesh_surface)]

The error message refers to one of the user defined functions:

                                                                +                 + +
Checking Serializability of <ray.data._internal.... TaskPoolMapOperator.Map(remesh_surface) object at 0x16fffa040>
================================================================================
#  list of objects that pass serialization check ...
!!! FAIL serialization: cannot pickle 'socket' object