ray-project / ray

Ray is an AI compute engine. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.
https://ray.io
Apache License 2.0
34.04k stars 5.78k forks source link

[aDAG][Core] Feeding multiple inputs to aDAG and annotating with TorchTensor does not move tensors to GPU #47251

Open keshavb96 opened 2 months ago

keshavb96 commented 2 months ago

What happened + What you expected to happen

When feeding multiple tensors as input to an aDAG, the tensors do not get moved to GPU memory despite annotating with TorchTensor(). The same issue does not occur when only feeding one tensor to the aDAG.

Versions / Dependencies

Ray==2.34

Reproduction script

import ray
import torch
import ray.dag
from ray.experimental.channel.torch_tensor_type import TorchTensorType

@ray.remote(num_gpus=1)
class Actor:
    def __init__(self) -> None:
        pass

    def test(self, tensor: torch.Tensor):
        return tensor.device

if __name__ == "__main__":
    ray.init()
    actor1, actor2 = Actor.remote(), Actor.remote()
    with ray.dag.InputNode() as dag_input:
        in1, in2 = dag_input[0], dag_input[1]

        # Dag 1
        in1 = in1.with_type_hint(TorchTensorType())
        in1 = actor1.test.bind(in1)

        # Dag 2
        in2 = in2.with_type_hint(TorchTensorType())
        in2 = actor2.test.bind(in2)

        dag = ray.dag.MultiOutputNode([in1, in2])

    adag = dag.experimental_compile()
    output = ray.get(adag.execute(torch.randn(2, 16), torch.tensor(1.0)))
    print(output)

Issue Severity

Medium: It is a significant difficulty but I can work around it.

keshavb96 commented 2 months ago

@stephanie-wang

anyscalesam commented 2 months ago

cc @kevin85421 > supporting multi-channel on single input will enable ADAG to support this...