towhee-io / towhee

Towhee is a framework that is dedicated to making neural data processing pipelines simple and fast.
https://towhee.io
Apache License 2.0
3.2k stars 247 forks source link

[Bug]: Operator " collaborative-experts" runs failed with error "ValueError: x.device cpu != cluster.device cuda:0" #2128

Closed binbinlv closed 2 years ago

binbinlv commented 2 years ago

Is there an existing issue for this?

Current Behavior

Operator " collaborative-experts" runs failed with error "ValueError: x.device cpu != cluster.device cuda:0"

(binbin-test-python-3.8-new) super@super-SYS-740GP-TNRT:~/binbin$ python3
Python 3.8.13 (default, Mar 28 2022, 11:38:47)
[GCC 7.5.0] :: Anaconda, Inc. on linux
Type "help", "copyright", "credits" or "license" for more information.
>>>
>>>
>>>
>>>
>>> import torch
>>> from towhee import Entity

>>> import towhee
>>> torch.manual_seed(42)
<torch._C.Generator object at 0x7f10e0ee9fd0>
>>>
>>> batch_size = 8
>>> experts = {"audio": torch.rand(batch_size, 29, 128),
...            "face": torch.rand(batch_size, 512),
...            "i3d.i3d.0": torch.rand(batch_size, 1024),
...            "imagenet.resnext101_32x48d.0": torch.rand(batch_size, 2048),
...            "imagenet.senet154.0": torch.rand(batch_size, 2048),
...            "ocr": torch.rand(batch_size, 49, 300),
...            "r2p1d.r2p1d-ig65m.0": torch.rand(batch_size, 512),
...            "scene.densenet161.0": torch.rand(batch_size, 2208),
...            "speech": torch.rand(batch_size, 32, 300)
...            }
>>> ind = {
...     "r2p1d.r2p1d-ig65m.0": torch.ones(batch_size,),
...     "imagenet.senet154.0": torch.ones(batch_size,),
...     "imagenet.resnext101_32x48d.0": torch.ones(batch_size,),
...     "scene.densenet161.0": torch.ones(batch_size,),
...     "audio": torch.ones(batch_size,),
...     "speech": torch.ones(batch_size,),
...     "ocr": torch.randint(low=0, high=2, size=(batch_size,)),
...     "face": torch.randint(low=0, high=2, size=(batch_size,)),
...     "i3d.i3d.0": torch.ones(batch_size,),
... }
>>> text = torch.randn(batch_size, 1, 37, 768)
>>>
>>> towhee.dc([Entity(experts=experts, ind=ind, text=text)]) \
...     .video_text_embedding.collaborative_experts[('experts', 'ind', 'text'), ('text_embds', 'vid_embds')]().show()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/super/.local/lib/python3.8/site-packages/towhee/hparam/hyperparameter.py", line 200, in __call__
    return self._func(*args, **kws)
  File "/home/super/.local/lib/python3.8/site-packages/towhee/functional/data_collection.py", line 142, in wrapper
    return self.map(op)
  File "/home/super/.local/lib/python3.8/site-packages/towhee/functional/mixins/dag.py", line 35, in wrapper
    children = f(self, *arg, **kws)
  File "/home/super/.local/lib/python3.8/site-packages/towhee/functional/data_collection.py", line 390, in map
    return self._factory(result)
  File "/home/super/.local/lib/python3.8/site-packages/towhee/functional/data_collection.py", line 295, in _factory
    iterable = list(iterable)
  File "/home/super/.local/lib/python3.8/site-packages/towhee/functional/data_collection.py", line 387, in inner
    return unary_op(x)
  File "/home/super/.local/lib/python3.8/site-packages/towhee/engine/execution/base_execution.py", line 32, in __call__
    res = self.__apply__(*arg, **kws)
  File "/home/super/.local/lib/python3.8/site-packages/towhee/engine/execution/base_execution.py", line 27, in __apply__
    return self._op(*args, **kws)
  File "/home/super/.local/lib/python3.8/site-packages/towhee/engine/operator_registry.py", line 201, in wrapper_call
    return old_call(self, *args, **kws)
  File "/home/super/.towhee/operators/video-text-embedding/collaborative_experts/main/collaborative_experts.py", line 41, in __call__
    out = self.ce_net_model(experts, ind, text)
  File "/home/super/.local/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/super/.local/lib/python3.8/site-packages/towhee/models/collaborative_experts/collaborative_experts.py", line 296, in forward
    aggregated_experts[mod] = self.pooling[mod](experts[mod])
  File "/home/super/.local/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/super/.local/lib/python3.8/site-packages/towhee/models/collaborative_experts/net_vlad.py", line 69, in forward
    raise ValueError(msg)
ValueError: x.device cpu != cluster.device cuda:0

Expected Behavior

No error

Steps To Reproduce

https://towhee.io/video-text-embedding/collaborative-experts

Environment

- Towhee version(e.g. v0.1.3 or 8b23a93):towhee dev latest
- OS(Ubuntu or CentOS): Ubuntu
- CPU/Memory:
- GPU:
- Others:

Anything else?

No response

binbinlv commented 2 years ago

It is fixed, so close it.

>>> towhee.dc([Entity(experts=experts, ind=ind, text=text)]) \
...     .video_text_embedding.collaborative_experts[('experts', 'ind', 'text'), ('text_embds', 'vid_embds')](device=device).show()
<IPython.core.display.HTML object>