pytorch / pytorch

Tensors and Dynamic neural networks in Python with strong GPU acceleration
https://pytorch.org
Other
83.11k stars 22.41k forks source link

Integer overflow while creating nested tensors #135930

Open lazear opened 1 month ago

lazear commented 1 month ago

🐛 Describe the bug

Hi,

Not sure if I'm using nested tensors incorrectly, but I would like to pad some variable-length sequences and feed the resulting padded tensor into a DataLoader.

torch.nested.to_padded_tensor(
    torch.nested.nested_tensor(tensor_list), 0
)

This approach was working great while developing a model, but I have been scaling up the input data and was hit with the following issue:

----> 1 torch.nested.nested_tensor(tensor_list)

File .venv\lib\site-packages\torch\nested\__init__.py:220, in nested_tensor(tensor_list, dtype, layout, device, requires_grad, pin_memory)
    219 if layout == torch.strided:
--> 220     return _nested.nested_tensor(
    221         tensor_list,
    222         dtype=dtype,
    223         device=device,
    224         requires_grad=requires_grad,
    225         pin_memory=pin_memory)
    226 elif layout == torch.jagged:
    227     # Need to wrap lists of scalars as tensors
    228     list_of_tensors = [t if isinstance(t, Tensor) else torch.as_tensor(t) for t in tensor_list]

RuntimeError: Trying to create tensor with negative dimension -1382983936: [-1382983936]

All of the input data is well-formed and properly typed (tensors of size (N,1280)). The negative dimension given corresponds to the summed number of elements in the tensor_list (N items dim 0 dim 1) and overflowing from a u32 -> i32 cast.

Code to reproduce

import torch

jagged_lengths = torch.randint(512, 1024, (3000,))
if (jagged_lengths.sum().item() * 1280 * 3000) > 2_147_483_647:
    tensor_list = []
    for length in jagged_lengths:
        tensor_list.append(torch.rand((length, 1280)))

    torch.nested.nested_tensor(tensor_list)

Versions

torch==2.4.1+cu124

cc @cpuhrsch @jbschlosser @bhosmer @drisspg @soulitzer @davidberard98 @YuqingJ

jbschlosser commented 1 month ago

I wasn't able to repro this on my local machine, but does using layout=torch.jagged work for you? I mention this because our support for jagged layout nested tensors is much better than that for "strided nested tensors", as the former work well with torch.compile.

Padded conversions recently landed for jagged layout nested tensors in #125947 so if you're using a new enough PyTorch, you can still use to_padded_tensor(). That said, if you can avoid materializing padded tensors and can stay in nested tensor land, memory usage and speed will generally be better.