ray-project / ray

Ray is an AI compute engine. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.
https://ray.io
Apache License 2.0
33.77k stars 5.75k forks source link

Core: ray.remote raises ValueError when used on torch IterableDataset #42914

Open albertbou92 opened 9 months ago

albertbou92 commented 9 months ago

What happened + What you expected to happen

When calling ray.remote on torch IterableDataset, I get ValueError: no signature found for builtin type <class 'types.GenericAlias'>. Which did not happen in previous versions of Torch and Ray.

Versions / Dependencies

Python 3.10.13 on linux Ray 2.9.1 Torch 2.3.0

Reproduction script

import ray
from torch.utils.data import IterableDataset

class SyncDataCollector(IterableDataset):
    def __iter__(self):
        return

ray.init()
ray.remote(SyncDataCollector)

Issue Severity

High: It blocks me from completing my task.

jjyao commented 9 months ago

Which did not happen in previous versions of Torch and Ray.

What about previous version of Torch and current version of Ray?

kevin85421 commented 9 months ago

The latest PyTorch release seems to be 2.2.0. See the PyTorch release page for more details. In addition, I tried pip install torch==2.3.0, and got the following error message:

ERROR: Could not find a version that satisfies the requirement torch==2.3.0 (from versions: 1.11.0, 1.12.0, 1.12.1, 1.13.0, 1.13.1, 2.0.0, 2.0.1, 2.1.0, 2.1.1, 2.1.2, 2.2.0)
ERROR: No matching distribution found for torch==2.3.0
albertbou92 commented 9 months ago

Hello! thanks a lot for the support :) Here is how you can reproduce the environment I used, including torch==2.3.0.

conda create -n test_env python=3.10 -y
conda activate test_env
pip3 install --pre torch --index-url https://download.pytorch.org/whl/nightly/cu121
pip3 install ray

I also tried with other versions. But to me seems to be related to the python version. Would that make sense?

albertbou92 commented 9 months ago

I tried for python 3.9 and got the same error, but for python 3.8 I dont get it

conda create -n test_env python=3.8 -y
conda activate test_env
pip3 install --pre torch --index-url https://download.pytorch.org/whl/nightly/cu121
pip3 install ray

Python 3.8.18 on linux Ray 2.9.2 Torch 2.3.0.dev20240207+cu121

sergiovalmac commented 8 months ago

I confirm I don't get the error for python 3.8 on Windows 11 with this minimal example

python3.8 -m venv 3.8_torchrl
3.8_torchrl\Scripts\Activate.ps1
pip install torchrl 
pip install ray

Python 3.8.10 on Windows 11 ray 2.9.2 torch 2.2.0 torchrl 0.3.0

kevin85421 commented 8 months ago

I can reproduce this issue. I will try to fix it.

I tried for python 3.9 and got the same error, but for python 3.8 I dont get it

Based on https://docs.python.org/3/library/types.html#types.GenericAlias, types.GenericAlias is first introduced in Python 3.9.

kevin85421 commented 8 months ago

This PR #43117 appears to fix the issue. If the CI passes, I'll delve deeper to get more details about the root cause.

jjyao commented 8 months ago

I have a simplified repo:

import ray
from typing import Iterable

@ray.remote
class SyncDataCollector(Iterable):
    def __iter__(self):
        pass
albertbou92 commented 8 months ago

@kevin85421 Thanks a lot for the support so far! I see https://github.com/ray-project/ray/pull/43117 is approved but there are some concerns regarding its merge. If that is the case, Is there any workaround I can do on my side with IterableDataset to make it work with Python 3.9+?

albertbou92 commented 7 months ago

Just in case it is useful to anyone, I ended up solving it by simply defining an empty __class_getitem__ method. So, modifying the original code snipped like this:

import ray
from torch.utils.data import IterableDataset

class SyncDataCollector(IterableDataset):

    def __class_getitem__(self, index):
        raise NotImplementedError

    def __iter__(self):
        return

ray.init()
ray.remote(SyncDataCollector)