Bug description:
Given: a generic pydantic v2 model in a separate python module
When: passing a parametrized instance of that generic model to a ray remote function as a task
Then: Ray throws an AttributeError
Note that this does not happen in all of these cases:
dataclasses are used instead of pydantic models
on creation of the pydantic model instance no concrete parametrization is defined i.e. Container(...) instead of Container[MySample](...)
the pydantic model class is defined in the same file in which the ray calls are created
the concrete parametrization is once declared in the module that implements the pydantic model class i.e. _ = Container[MyItem]
Note also:
Ray is fully capable of serializing and deserializing the model using cloudpickle when pickling is executed explicitly.
Sadly none of the above workarounds are a great option in my case and I would like to better understand why this happens and how I can fix this and similar issues going forward.
Expected behavior:
The task should start and process the passed object as expected.
Defining or not defining the generic parametrization and the class location should have no bearing on rays capability to serialize and deserialize the pydantic model and it should work in all cases.
Logs:
2024-09-27 14:24:24,007 INFO worker.py:1786 -- Started a local Ray instance.
(process_item pid=464) Can't get attribute 'Container[MyItem]' on <module 'toyexample.container' from 'D:\Development\Python\workspace\toyexample\container.py'>
(process_item pid=464) Traceback (most recent call last):
(process_item pid=464) File "D:\Development\Python\workspace\venv\py12\Lib\site-packages\ray_private\serialization.py", line 423, in deserialize_objects
(process_item pid=464) obj = self._deserialize_object(data, metadata, object_ref)
(process_item pid=464) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(process_item pid=464) File "D:\Development\Python\workspace\venv\py12\Lib\site-packages\ray_private\serialization.py", line 280, in _deserialize_object
(process_item pid=464) return self._deserialize_msgpack_data(data, metadata_fields)
(process_item pid=464) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(process_item pid=464) File "D:\Development\Python\workspace\venv\py12\Lib\site-packages\ray_private\serialization.py", line 235, in _deserialize_msgpack_data
(process_item pid=464) python_objects = self._deserialize_pickle5_data(pickle5_data)
(process_item pid=464) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(process_item pid=464) File "D:\Development\Python\workspace\venv\py12\Lib\site-packages\ray_private\serialization.py", line 225, in _deserialize_pickle5_data
Traceback (most recent call last):
(process_item pid=464) obj = pickle.loads(in_band)
File "", line 198, in _run_module_as_main
(process_item pid=464) ^^^^^^^^^^^^^^^^^^^^^
File "", line 88, in _run_code
(process_item pid=464) AttributeError: Can't get attribute 'Container[MyItem]' on <module 'toyexample.container' from 'D:\Development\Python\workspace\toyexample\container.py'>
File "D:\Development\Python\workspace\toyexample\main.py", line 14, in
ray.get(process_item.remote(container))
File "D:\Development\Python\workspace\venv\py12\Lib\site-packages\ray_private\auto_init_hook.py", line 21, in auto_init_wrapper
return fn(*args, *kwargs)
^^^^^^^^^^^^^^^^^^^
File "D:\Development\Python\workspace\venv\py12\Lib\site-packages\ray_private\client_mode_hook.py", line 103, in wrapper
return func(args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "D:\Development\Python\workspace\venv\py12\Lib\site-packages\ray_private\worker.py", line 2691, in get
values, debugger_breakpoint = worker.get_objects(object_refs, timeout=timeout)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "D:\Development\Python\workspace\venv\py12\Lib\site-packages\ray_private\worker.py", line 871, in get_objects
raise value.as_instanceof_cause()
ray.exceptions.RayTaskError(RaySystemError): ray::process_item() (pid=464, ip=127.0.0.1)
File "python\ray_raylet.pyx", line 1806, in ray._raylet.execute_task
File "python\ray_raylet.pyx", line 1840, in ray._raylet.execute_task
File "python\ray_raylet.pyx", line 943, in ray._raylet.raise_if_dependency_failed
ray.exceptions.RaySystemError: System error: Can't get attribute 'Container[MyItem]' on <module 'toyexample.container' from 'D:\Development\Python\workspace\toyexample\container.py'>
traceback: Traceback (most recent call last):
File "D:\Development\Python\workspace\venv\py12\Lib\site-packages\ray_private\serialization.py", line 423, in deserialize_objects
obj = self._deserialize_object(data, metadata, object_ref)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "D:\Development\Python\workspace\venv\py12\Lib\site-packages\ray_private\serialization.py", line 280, in _deserialize_object
return self._deserialize_msgpack_data(data, metadata_fields)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "D:\Development\Python\workspace\venv\py12\Lib\site-packages\ray_private\serialization.py", line 235, in _deserialize_msgpack_data
python_objects = self._deserialize_pickle5_data(pickle5_data)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "D:\Development\Python\workspace\venv\py12\Lib\site-packages\ray_private\serialization.py", line 225, in _deserialize_pickle5_data
obj = pickle.loads(in_band)
^^^^^^^^^^^^^^^^^^^^^
AttributeError: Can't get attribute 'Container[MyItem]' on <module 'toyexample.container' from 'D:\Development\Python\workspace\toyexample\container.py'>
Versions / Dependencies
Verified on
Ubuntu22.04, Python 3.10, Ray 2.36.1 and 2.37.0, Pydantic 2.9.2
Windows 10, Python 3.12, Ray 2.37.0, Pydantic 2.9.2
Reproduction script
main.py
import ray
from .container import Container, MyItem
container = Container[MyItem](name="my_container", items=[MyItem(value=2, description="example")])
ray.init()
@ray.remote
def process_container(container: Container):
print(container.name)
ray.get(process_container.remote(container))
container.py
from pydantic import BaseModel
from typing import TypeVar, Generic
class Item(BaseModel):
value: int
class MyItem(Item):
description: str
ItemT = TypeVar("ItemT", bound=Item)
class Container(BaseModel, Generic[ItemT]):
name: str
items: list[ItemT]
Issue Severity
Medium: It is a significant difficulty but I can work around it.
What happened + What you expected to happen
Bug description: Given: a generic pydantic v2 model in a separate python module When: passing a parametrized instance of that generic model to a ray remote function as a task Then: Ray throws an AttributeError
Note that this does not happen in all of these cases:
Container(...)
instead ofContainer[MySample](...)
_ = Container[MyItem]
Note also: Ray is fully capable of serializing and deserializing the model using cloudpickle when pickling is executed explicitly.
Sadly none of the above workarounds are a great option in my case and I would like to better understand why this happens and how I can fix this and similar issues going forward.
Expected behavior: The task should start and process the passed object as expected. Defining or not defining the generic parametrization and the class location should have no bearing on rays capability to serialize and deserialize the pydantic model and it should work in all cases.
Logs: 2024-09-27 14:24:24,007 INFO worker.py:1786 -- Started a local Ray instance. (process_item pid=464) Can't get attribute 'Container[MyItem]' on <module 'toyexample.container' from 'D:\Development\Python\workspace\toyexample\container.py'> (process_item pid=464) Traceback (most recent call last): (process_item pid=464) File "D:\Development\Python\workspace\venv\py12\Lib\site-packages\ray_private\serialization.py", line 423, in deserialize_objects (process_item pid=464) obj = self._deserialize_object(data, metadata, object_ref) (process_item pid=464) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (process_item pid=464) File "D:\Development\Python\workspace\venv\py12\Lib\site-packages\ray_private\serialization.py", line 280, in _deserialize_object (process_item pid=464) return self._deserialize_msgpack_data(data, metadata_fields) (process_item pid=464) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (process_item pid=464) File "D:\Development\Python\workspace\venv\py12\Lib\site-packages\ray_private\serialization.py", line 235, in _deserialize_msgpack_data (process_item pid=464) python_objects = self._deserialize_pickle5_data(pickle5_data) (process_item pid=464) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (process_item pid=464) File "D:\Development\Python\workspace\venv\py12\Lib\site-packages\ray_private\serialization.py", line 225, in _deserialize_pickle5_data Traceback (most recent call last): (process_item pid=464) obj = pickle.loads(in_band) File "", line 198, in _run_module_as_main
(process_item pid=464) ^^^^^^^^^^^^^^^^^^^^^
File "", line 88, in _run_code
(process_item pid=464) AttributeError: Can't get attribute 'Container[MyItem]' on <module 'toyexample.container' from 'D:\Development\Python\workspace\toyexample\container.py'>
File "D:\Development\Python\workspace\toyexample\main.py", line 14, in
ray.get(process_item.remote(container))
File "D:\Development\Python\workspace\venv\py12\Lib\site-packages\ray_private\auto_init_hook.py", line 21, in auto_init_wrapper
return fn(*args, *kwargs)
^^^^^^^^^^^^^^^^^^^
File "D:\Development\Python\workspace\venv\py12\Lib\site-packages\ray_private\client_mode_hook.py", line 103, in wrapper
return func(args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "D:\Development\Python\workspace\venv\py12\Lib\site-packages\ray_private\worker.py", line 2691, in get
values, debugger_breakpoint = worker.get_objects(object_refs, timeout=timeout)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "D:\Development\Python\workspace\venv\py12\Lib\site-packages\ray_private\worker.py", line 871, in get_objects
raise value.as_instanceof_cause()
ray.exceptions.RayTaskError(RaySystemError): ray::process_item() (pid=464, ip=127.0.0.1)
File "python\ray_raylet.pyx", line 1806, in ray._raylet.execute_task
File "python\ray_raylet.pyx", line 1840, in ray._raylet.execute_task
File "python\ray_raylet.pyx", line 943, in ray._raylet.raise_if_dependency_failed
ray.exceptions.RaySystemError: System error: Can't get attribute 'Container[MyItem]' on <module 'toyexample.container' from 'D:\Development\Python\workspace\toyexample\container.py'>
traceback: Traceback (most recent call last):
File "D:\Development\Python\workspace\venv\py12\Lib\site-packages\ray_private\serialization.py", line 423, in deserialize_objects
obj = self._deserialize_object(data, metadata, object_ref)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "D:\Development\Python\workspace\venv\py12\Lib\site-packages\ray_private\serialization.py", line 280, in _deserialize_object
return self._deserialize_msgpack_data(data, metadata_fields)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "D:\Development\Python\workspace\venv\py12\Lib\site-packages\ray_private\serialization.py", line 235, in _deserialize_msgpack_data
python_objects = self._deserialize_pickle5_data(pickle5_data)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "D:\Development\Python\workspace\venv\py12\Lib\site-packages\ray_private\serialization.py", line 225, in _deserialize_pickle5_data
obj = pickle.loads(in_band)
^^^^^^^^^^^^^^^^^^^^^
AttributeError: Can't get attribute 'Container[MyItem]' on <module 'toyexample.container' from 'D:\Development\Python\workspace\toyexample\container.py'>
Versions / Dependencies
Verified on
Reproduction script
main.py
container.py
Issue Severity
Medium: It is a significant difficulty but I can work around it.