microsoft / Swin-Transformer

This is an official implementation for "Swin Transformer: Hierarchical Vision Transformer using Shifted Windows".
https://arxiv.org/abs/2103.14030
MIT License
13.29k stars 2.02k forks source link

`torch.clamp` issue due to update of pytorch 1.12.0 #237

Open CryptoSalamander opened 1 year ago

CryptoSalamander commented 1 year ago

Since torch.clamp was updated in 1.12.0, the latest version of Pytorch, torch.clamp's min, max argument should be loaded on same device with input tensor. https://github.com/pytorch/pytorch/pull/77035

I got an error with PyTorch 1.12.0 in this line, https://github.com/microsoft/Swin-Transformer/blob/b720b4191588c19222ccf129860e905fb02373a7/models/swin_transformer_v2.py#L156

Error :

backbones/SwinV2.py:153, in WindowAttention.forward(self, x, mask)
    151 # cosine attention
    152 attn = (F.normalize(q, dim=-1) @ F.normalize(k, dim=-1).transpose(-2, -1))
--> 153 logit_scale = torch.clamp(self.logit_scale, max=torch.log(torch.tensor(1. / 0.01))).exp()
    154 attn = attn * logit_scale
    156 relative_position_bias_table = self.cpb_mlp(self.relative_coords_table).view(-1, self.num_heads)

RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! (when checking argument for argument max in method wrapper_clamp_Tensor)

In 1.11.0 this line works without problems because there was no argument-type promotion before 1.12.0! but now, guess it should be fixed.

jaehyunnn commented 1 year ago

I have simply solved it as follows:

153 logit_scale = torch.clamp(self.logit_scale, max=torch.log(torch.tensor(1. / 0.01)).to(self.logit_scale.get_device())).exp()
haraldger commented 1 year ago

I solved it as follows:

image image

A fix to this problem would be very useful.