pytorch / pytorch

Tensors and Dynamic neural networks in Python with strong GPU acceleration
https://pytorch.org
Other
82.25k stars 22.12k forks source link

EmbeddingBag allows out of index ranges.! #70170

Open GyuminJack opened 2 years ago

GyuminJack commented 2 years ago

🐛 Describe the bug

under torch's version 1.5 embeddingbag doesn't allow pass through over index when i set 'n' however, Over 1.6 version of torch, embeddingbag allows 'out of range indices' without any alert or informations..! plus. it can call forward api, but when i call backward api , raise 'Segmentaion fault' without any other logs.

is it intended result?

here is my code.

import torch
print(torch.__version__)

n = 100
m = 4

# ----- 
eb1 = torch.nn.EmbeddingBag(n, m)
eb2 = torch.nn.EmbeddingBag(n, m)
dense_layer = torch.nn.Linear(m,1)
loss = torch.nn.MSELoss()
optimizer = torch.optim.SGD

# -----
input_index_1 = torch.Tensor([0,2,100,2]).long()
input_index_2 = torch.Tensor([0,2,100,2]).long()
input_offset = torch.Tensor([0]).long()
y = torch.Tensor([1])

# -----
try:
    x1 = eb1(input_index_1, input_offset)
    x2 = eb2(input_index_2, input_offset)
    print("Pass over-embedding indices")
except Exception as e:
    print(e)
x = dense_layer(x1+x2)
loss = loss(x.view(-1),y.view(-1))

debug_backward = True

if debug_backward:
    try:
        loss.backward()
        print(loss)
    except Exception as e:
        print(e)
else:
    print("JUST FORWARD : ", loss)        
torch.__version__ >= 1.6.0
# printed error
Pass over-embedding indices
[1]    95791 segmentation fault  python3 embedding_error.py

torch.__version__ <= 1.5.0
# printed error
1.5.0
[enforce fail at embedding_lookup_idx.cc:211] 0 <= idx && idx < data_size. Index 2 is out of bounds: 100, range 0 to 100

Versions

pip install 'torch>=1.6' pip install 'torch<=1.5'

ngimel commented 2 years ago

This is fixed in one of the previous versions and on master

GyuminJack commented 2 years ago

Thanks reply ngimel.
I tested this one on version 1.10.0 but still occurred. Could you explain what previous version fix this bug? Or patch note?

ngimel commented 2 years ago

You are right, it might be a bug with fbgemm that's actually not fixed

GyuminJack commented 2 years ago

Ok. thanks for rechecking!

GyuminJack commented 2 years ago

so.. when this bug fixed? because when inference phase, we use only forward propagation. in this situation when inferring the index which not in specific embeddingbag index range, there is problem(bug)s, but they can't receive any problem logs.

or Could you give me any other solution for this problem?

ngimel commented 2 years ago

Sorry, it looks like it's not fixed, I had a build with fbgemm disabled that didn't have this bug, but in a regular build fbgemm will be used, and likely will trigger this bug. cc @jianyuh, can someone from fbgemm look at it?

jianyuh commented 2 years ago

I believe this issue is fixed in the latest PyTorch version, with https://github.com/pytorch/pytorch/pull/65186 . cc @jspark1105 .

Before PT 1.5, EmbeddingBag calls Caffe2 perfkernel implementation with https://github.com/pytorch/pytorch/blob/c4a6c7a436c6f57ea35b1f2d226c5a052930a8de/caffe2/perfkernels/embedding_lookup_idx.cc#L198 , where it has the index bound checker.

After PT 1.6, EmbeddingBag calls SPMDM routine in FBGEMM: https://github.com/pytorch/pytorch/blob/91da2d5fa11c1a416420133038fcdc49a0eceb68/aten/src/ATen/native/EmbeddingBag.cpp#L203-L233 . "fbgemm_spmdm_reporterror" should report more information. If it still reports "Segmentation Fault" issue, we might need to hoist the "fbgemm_spmdm_reporterror" function before the actual fbgemm SPMDM kernel runs.