nasaharvest / presto

Lightweight, Pre-trained Transformers for Remote Sensing Timeseries
https://arxiv.org/abs/2304.14065
MIT License
151 stars 26 forks source link

`RuntimeError` "Expected all tensors to be on the same device..." when cuda is available. #29

Closed mkondratyev85 closed 6 months ago

mkondratyev85 commented 6 months ago

I get an error if I have cuda available on my computer.

RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cpu and cuda:0! (when checking argument for argument mat1 in method wrapper_CUDA_addmm)

            values = torch.stack(xs, axis=0).float().to(device)
            dynamic_world = torch.stack(dynamic_worlds, axis=0).long().to(device)
            mask = torch.stack(masks, axis=0).bool().to(device)
            latlons = torch.stack(latlonss, axis=0).float().to(device)
            month = torch.stack(months, axis=0).long().to(device)

            print(f"{values.device=}")
            print(f"{dynamic_world.device=}")
            print(f"{mask.device=}")
            print(f"{latlons.device=}")
            print(f"{month.device=}")

            with torch.no_grad():
                features = (
                    pretrained_model.encoder(
                        values,
                        dynamic_world=dynamic_world,
                        mask=mask,
                        latlons=latlons,
                        month=month,
                    )
                    .cpu()
                    .numpy()
                )

All values that I pass to the encoder are on the same device as you can see from the code. Here's the output of the printed debug messages:

values.device=device(type='cuda', index=0)
dynamic_world.device=device(type='cuda', index=0)
mask.device=device(type='cuda', index=0)
latlons.device=device(type='cuda', index=0)
month.device=device(type='cuda', index=0)

Nothing changes if I replace device="cuda" to device="cpu" we still have the same error.

The full stack trace of the error:

 Traceback (most recent call last):
  File "/home/mikhail/source/presto_features/main.py", line 188, in <module>
    process_tile(
  File "/home/mikhail/source/presto_features/main.py", line 156, in process_tile
    pretrained_model.encoder(
  File "/home/mikhail/.cache/pypoetry/virtualenvs/presto-features-bmBP-FwO-py3.10/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/mikhail/.cache/pypoetry/virtualenvs/presto-features-bmBP-FwO-py3.10/lib/python3.10/site-packages/presto/presto.py", line 415, in forward
    tokens = self.eo_patch_embed[channel_group](x[:, :, channel_idxs])
  File "/home/mikhail/.cache/pypoetry/virtualenvs/presto-features-bmBP-FwO-py3.10/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/mikhail/.cache/pypoetry/virtualenvs/presto-features-bmBP-FwO-py3.10/lib/python3.10/site-packages/torch/nn/modules/linear.py", line 114, in forward
    return F.linear(input, self.weight, self.bias)
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cpu and cuda:0! (when checking argument for argument mat1 in method wrapper_CUDA_addmm)
mkondratyev85 commented 6 months ago

Fixed by #31