ray-project / ray

Ray is an AI compute engine. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.
https://ray.io
Apache License 2.0
33.69k stars 5.73k forks source link

[Core] ray.utli.ActorPool.get_next() doesn't support multiple return values with timeout not being None #38607

Open PRESIDENT810 opened 1 year ago

PRESIDENT810 commented 1 year ago

What happened + What you expected to happen

  1. The bug: Reproduce code
    
    import ray
    from ray.util import ActorPool

ray.init()

@ray.remote class Foo: @ray.method(num_returns=2) def some_task(self, x):

Simulate some work

    ray.sleep(10)
    return x, 2

actors = [Foo.remote() for _ in range(2)] pool = ActorPool(actors)

pool.submit(lambda a, v: a.some_task.remote(v), 1) pool.get_next(timeout=1, ignore_if_timedout=True)

This will trigger:
TypeError: wait() expected a list of ray.ObjectRef or ray.StreamingObjectRefGenerator, got list containing <class 'list'>

Seems the problem is caused by 
    if timeout is not None:
        res, _ = ray.wait([future], timeout=timeout)
The future passed here could be a list of ray.ObjectRef instead of a single ray.ObjectRef when using @ray.method(num_return=<number greater than 1>). If timeout is not None, `[future]` becomes a list of a single list, which triggers the error. 

2. Expected behavior:
It is excepted to function normally.

3. Error message:

Traceback (most recent call last): File "/Users/zhongkaining/PycharmProjects/pythonProject1/main.py", line 19, in pool.get_next(timeout=1, ignore_if_timedout=True) File "/usr/local/lib/python3.11/site-packages/ray/util/actor_pool.py", line 283, in getnext res, = ray.wait([future], timeout=timeout) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/site-packages/ray/_private/auto_init_hook.py", line 24, in auto_init_wrapper return fn(*args, *kwargs) ^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/site-packages/ray/_private/client_mode_hook.py", line 103, in wrapper return func(args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/site-packages/ray/_private/worker.py", line 2701, in wait raise TypeError( TypeError: wait() expected a list of ray.ObjectRef or ray.StreamingObjectRefGenerator, got list containing <class 'list'>


### Versions / Dependencies

I was using ray version = 2.6.2. I also pulled the source code and this problem still remains in mainline branch.

### Reproduction script

import ray from ray.util import ActorPool

ray.init()

@ray.remote class Foo: @ray.method(num_returns=2) def some_task(self, x):

Simulate some work

    ray.sleep(10)
    return x, 2

actors = [Foo.remote() for _ in range(2)] pool = ActorPool(actors)

pool.submit(lambda a, v: a.some_task.remote(v), 1) pool.get_next(timeout=1, ignore_if_timedout=True)



### Issue Severity

Low: It annoys or frustrates me.
PRESIDENT810 commented 1 year ago

This should be an easy fix. Please assign this to me and I will fix it quickly.

rkooo567 commented 1 year ago

Yeah @PRESIDENT810 can you make a PR?

PRESIDENT810 commented 1 year ago

Yeah @PRESIDENT810 can you make a PR?

Yeah sure, I'm working on it.

Btw this method has another problem, as I mentioned in #38635. I submitted my PR in #38641 to fix that issue. Mind if take a look, and merge that PR first if that's OK?