ray-project / ray

Ray is an AI compute engine. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.
https://ray.io
Apache License 2.0
34.01k stars 5.78k forks source link

[Core] Deserialization of generic pydantic models #47840

Open JHaertl opened 1 month ago

JHaertl commented 1 month ago

What happened + What you expected to happen

Bug description: Given: a generic pydantic v2 model in a separate python module When: passing a parametrized instance of that generic model to a ray remote function as a task Then: Ray throws an AttributeError

Note that this does not happen in all of these cases:

Note also: Ray is fully capable of serializing and deserializing the model using cloudpickle when pickling is executed explicitly.

Sadly none of the above workarounds are a great option in my case and I would like to better understand why this happens and how I can fix this and similar issues going forward.

Expected behavior: The task should start and process the passed object as expected. Defining or not defining the generic parametrization and the class location should have no bearing on rays capability to serialize and deserialize the pydantic model and it should work in all cases.

Logs: 2024-09-27 14:24:24,007 INFO worker.py:1786 -- Started a local Ray instance. (process_item pid=464) Can't get attribute 'Container[MyItem]' on <module 'toyexample.container' from 'D:\Development\Python\workspace\toyexample\container.py'> (process_item pid=464) Traceback (most recent call last): (process_item pid=464) File "D:\Development\Python\workspace\venv\py12\Lib\site-packages\ray_private\serialization.py", line 423, in deserialize_objects (process_item pid=464) obj = self._deserialize_object(data, metadata, object_ref) (process_item pid=464) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (process_item pid=464) File "D:\Development\Python\workspace\venv\py12\Lib\site-packages\ray_private\serialization.py", line 280, in _deserialize_object (process_item pid=464) return self._deserialize_msgpack_data(data, metadata_fields) (process_item pid=464) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (process_item pid=464) File "D:\Development\Python\workspace\venv\py12\Lib\site-packages\ray_private\serialization.py", line 235, in _deserialize_msgpack_data (process_item pid=464) python_objects = self._deserialize_pickle5_data(pickle5_data) (process_item pid=464) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (process_item pid=464) File "D:\Development\Python\workspace\venv\py12\Lib\site-packages\ray_private\serialization.py", line 225, in _deserialize_pickle5_data Traceback (most recent call last): (process_item pid=464) obj = pickle.loads(in_band) File "", line 198, in _run_module_as_main (process_item pid=464) ^^^^^^^^^^^^^^^^^^^^^ File "", line 88, in _run_code (process_item pid=464) AttributeError: Can't get attribute 'Container[MyItem]' on <module 'toyexample.container' from 'D:\Development\Python\workspace\toyexample\container.py'> File "D:\Development\Python\workspace\toyexample\main.py", line 14, in ray.get(process_item.remote(container)) File "D:\Development\Python\workspace\venv\py12\Lib\site-packages\ray_private\auto_init_hook.py", line 21, in auto_init_wrapper return fn(*args, *kwargs) ^^^^^^^^^^^^^^^^^^^ File "D:\Development\Python\workspace\venv\py12\Lib\site-packages\ray_private\client_mode_hook.py", line 103, in wrapper return func(args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "D:\Development\Python\workspace\venv\py12\Lib\site-packages\ray_private\worker.py", line 2691, in get values, debugger_breakpoint = worker.get_objects(object_refs, timeout=timeout) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "D:\Development\Python\workspace\venv\py12\Lib\site-packages\ray_private\worker.py", line 871, in get_objects raise value.as_instanceof_cause() ray.exceptions.RayTaskError(RaySystemError): ray::process_item() (pid=464, ip=127.0.0.1) File "python\ray_raylet.pyx", line 1806, in ray._raylet.execute_task File "python\ray_raylet.pyx", line 1840, in ray._raylet.execute_task File "python\ray_raylet.pyx", line 943, in ray._raylet.raise_if_dependency_failed ray.exceptions.RaySystemError: System error: Can't get attribute 'Container[MyItem]' on <module 'toyexample.container' from 'D:\Development\Python\workspace\toyexample\container.py'> traceback: Traceback (most recent call last): File "D:\Development\Python\workspace\venv\py12\Lib\site-packages\ray_private\serialization.py", line 423, in deserialize_objects obj = self._deserialize_object(data, metadata, object_ref) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "D:\Development\Python\workspace\venv\py12\Lib\site-packages\ray_private\serialization.py", line 280, in _deserialize_object return self._deserialize_msgpack_data(data, metadata_fields) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "D:\Development\Python\workspace\venv\py12\Lib\site-packages\ray_private\serialization.py", line 235, in _deserialize_msgpack_data python_objects = self._deserialize_pickle5_data(pickle5_data) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "D:\Development\Python\workspace\venv\py12\Lib\site-packages\ray_private\serialization.py", line 225, in _deserialize_pickle5_data obj = pickle.loads(in_band) ^^^^^^^^^^^^^^^^^^^^^ AttributeError: Can't get attribute 'Container[MyItem]' on <module 'toyexample.container' from 'D:\Development\Python\workspace\toyexample\container.py'>

Versions / Dependencies

Verified on

Reproduction script

main.py

import ray
from .container import Container, MyItem

container = Container[MyItem](name="my_container", items=[MyItem(value=2, description="example")])

ray.init()

@ray.remote
def process_container(container: Container):
    print(container.name)

ray.get(process_container.remote(container))

container.py

from pydantic import BaseModel
from typing import TypeVar, Generic

class Item(BaseModel):
    value: int

class MyItem(Item):
    description: str

ItemT = TypeVar("ItemT", bound=Item)

class Container(BaseModel, Generic[ItemT]):
    name: str
    items: list[ItemT]

Issue Severity

Medium: It is a significant difficulty but I can work around it.

jjyao commented 1 month ago

I feel there might be some sys.path issue. @MengjinYan can you take a look at it?