RuntimeError: isDifferentiableType(variable.scalar_type()) INTERNAL ASSERT FAILED when using torch.repeat

ajayvohra2005 commented 3 months ago

🐛 Bug

Using torch.repeat leads to runtime error:

WARNING: All log messages before absl::InitializeLog() is called are written to STDERR
I0000 00:00:1717594747.366986     228 cuda_executor.cc:1032] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355
I0000 00:00:1717594747.369320      16 service.cc:145] XLA service 0x55fb8bed0e60 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
I0000 00:00:1717594747.369368      16 service.cc:153]   StreamExecutor device (0): NVIDIA A10G, Compute Capability 8.6
I0000 00:00:1717594747.369825      16 se_gpu_pjrt_client.cc:853] Using BFC allocator.
I0000 00:00:1717594747.369881      16 gpu_helpers.cc:107] XLA backend allocating 17696931840 bytes on device 0 for BFCAllocator.
I0000 00:00:1717594747.369915      16 gpu_helpers.cc:147] XLA backend will use up to 5898977280 bytes on device 0 for CollectiveBFCAllocator.
I0000 00:00:1717594747.370073      16 cuda_executor.cc:1032] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355
Traceback (most recent call last):
  File "/app/torch_xla_issue.py", line 35, in <module>
    loss = custom_loss_module.forward(pred=custom_pred[:, [-1], :], target=custom_target, is_dummy=True)
  File "/app/torch_xla_issue.py", line 22, in forward
    cos_sim = self.custom_cos_similarity_module.forward(x1=pred, x2=target)
  File "/usr/local/lib/python3.10/site-packages/torch/nn/modules/distance.py", line 89, in forward
    return F.cosine_similarity(x1, x2, self.dim, self.eps)
RuntimeError: isDifferentiableType(variable.scalar_type()) INTERNAL ASSERT FAILED at "/src/pytorch/torch/csrc/autograd/functions/utils.h":75, please report a bug to PyTorch.

To Reproduce

Steps to reproduce the behavior:

Docker Image:

us-central1-docker.pkg.dev/tpu-pytorch-releases/docker/xla:r2.3.0_3.10_cuda_12.1

Python script to reproduce error:

import torch
import torch.nn as nn

import torch_xla.core.xla_model as xm

class CustomLoss(nn.Module):

    def __init__(self, cos_similarity_dim=2):
        super().__init__()

        self.custom_cos_similarity_module = nn.CosineSimilarity(dim=cos_similarity_dim)
        self.custom_mse_loss_module = nn.MSELoss(reduction="none")

    def forward(self, pred: torch.Tensor, target: torch.Tensor, is_dummy: bool):

        if is_dummy:
            pred = pred.repeat((1, target.shape[1], 1)) 
            #pred = pred.expand((-1, target.shape[1], -1)) 
        else:
            assert pred.shape[1] == target.shape[1]

        cos_sim = self.custom_cos_similarity_module.forward(x1=pred, x2=target)
        custom_cos_loss_tensor = 1 - cos_sim
        custom_mse_loss_tensor = self.custom_mse_loss_module.forward(input=pred, target=target)

        return custom_cos_loss_tensor, custom_mse_loss_tensor

custom_loss_module = CustomLoss()

device = xm.xla_device()
custom_pred = torch.rand( size=(10, 150, 256), dtype=torch.float, requires_grad=True).to(device)
custom_target = torch.zeros(size=(10, 150, 256), dtype=torch.float, requires_grad=True).to(device)

loss = custom_loss_module.forward(pred=custom_pred[:, [-1], :], target=custom_target, is_dummy=True)
print(loss)

Expected behavior

Script should run without error

Environment

Reproducible on XLA backend [CUDA]:
torch_xla version: us-central1-docker.pkg.dev/tpu-pytorch-releases/docker/xla:r2.3.0_3.10_cuda_12.1

Additional context

Using torch.expand is a work around.

JackCaoG commented 3 months ago

@zpcore can you take a look at this one? I suspect you can repo with the CPU as well.

zpcore commented 3 months ago

@JackCaoG , yes, the issue can be reproduced with the xla CPU also. Meanwhile, I tried the same code with the master branch, the issue doesn't exist. So the issue only exists on the 2.3 release.

The simplest solution is to use the most latest docker build us-central1-docker.pkg.dev/tpu-pytorch-releases/docker/xla:nightly_3.10_cuda_12.1_20240605. @ajayvohra2005 , can you try this docker instead?

pytorch / xla