Scripted Roberta model stuck in second inference call

mreso commented 2 years ago

🐛 Bug

Describe the bug Hi,

I've scripted a Roberta model and when I do two inference calls on it, the second call returns a result only after several minutes (up to 16 minutes). I see this behavior under Linux on GPU as well as on CPU. Mac on CPU works fine.

To Reproduce Steps to reproduce the behavior: Run this script: The last call of the model takes 4-16 minutes to finish.

import torch
from torchtext.models import XLMR_BASE_ENCODER, RobertaClassificationHead

classifier_head = RobertaClassificationHead(
    num_classes=2, input_dim=768
)

model = XLMR_BASE_ENCODER.get_model(head=classifier_head)

model.eval()

model = model.to("cuda")

print("##### NOT SCRIPTED MODEL ######")

with torch.no_grad():
    results = model(torch.LongTensor([0, 12, 23, 2]).to("cuda").unsqueeze(0))
    print(results)
    results = model(torch.LongTensor([0, 12, 23, 2]).to("cuda").unsqueeze(0))
    print(results)

model = torch.jit.script(model)

print("##### SCRIPTED MODEL ######")

with torch.no_grad():
    results = model(torch.LongTensor([0, 12, 23, 2]).to("cuda").unsqueeze(0))
    print(results)
    results = model(torch.LongTensor([0, 12, 23, 2]).to("cuda").unsqueeze(0))
    print(results)

Expected behavior

Second call should return in a similar timeframe as previous calls.

Environment

Ran this on an g4 and p3 AWS instances with T4 and V100, respectively, as well as a colab notebook with a T4 with the same result. Tried these torch/torchtext combinations:

torch 1.12.0 with torchtext 0.13.0
torch-1.13.0.dev20220720+cu113 torchtext-0.14.0.dev20220720

Collecting environment information... PyTorch version: 1.13.0.dev20220720+cu113 Is debug build: False CUDA used to build PyTorch: 11.3 ROCM used to build PyTorch: N/A

OS: Ubuntu 18.04.5 LTS (x86_64) GCC version: (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0 Clang version: 6.0.0-1ubuntu2 (tags/RELEASE_600/final) CMake version: version 3.22.5 Libc version: glibc-2.26

Python version: 3.7.13 (default, Apr 24 2022, 01:04:09) [GCC 7.5.0] (64-bit runtime) Python platform: Linux-5.4.188+-x86_64-with-Ubuntu-18.04-bionic Is CUDA available: True CUDA runtime version: 11.1.105 GPU models and configuration: GPU 0: Tesla T4 Nvidia driver version: 460.32.03 cuDNN version: Probably one of the following: /usr/lib/x86_64-linux-gnu/libcudnn.so.7.6.5 /usr/lib/x86_64-linux-gnu/libcudnn.so.8.0.5 /usr/lib/x86_64-linux-gnu/libcudnn_adv_infer.so.8.0.5 /usr/lib/x86_64-linux-gnu/libcudnn_adv_train.so.8.0.5 /usr/lib/x86_64-linux-gnu/libcudnn_cnn_infer.so.8.0.5 /usr/lib/x86_64-linux-gnu/libcudnn_cnn_train.so.8.0.5 /usr/lib/x86_64-linux-gnu/libcudnn_ops_infer.so.8.0.5 /usr/lib/x86_64-linux-gnu/libcudnn_ops_train.so.8.0.5 HIP runtime version: N/A MIOpen runtime version: N/A Is XNNPACK available: True

Versions of relevant libraries: [pip3] numpy==1.21.6 [pip3] torch==1.13.0.dev20220720+cu113 [pip3] torchaudio==0.12.0+cu113 [pip3] torchsummary==1.5.1 [pip3] torchtext==0.14.0.dev20220720 [pip3] torchvision==0.13.0+cu113 [conda] Could not collect

parmeet commented 2 years ago

Ok this does seems strange. I could repro the issue, but without torch.no_grad() it works just fine. Also when I run it once without torch.no_grad() and then run with torch.no_grad() it does manage to run:

import torch
from torchtext.models import XLMR_BASE_ENCODER, RobertaClassificationHead

classifier_head = RobertaClassificationHead(
    num_classes=2, input_dim=768
)

model = XLMR_BASE_ENCODER.get_model(head=classifier_head)

model.eval()

model = model.to("cuda")

print("##### NOT SCRIPTED MODEL ######")

with torch.no_grad():
    results = model(torch.LongTensor([0, 12, 23, 2]).to("cuda").unsqueeze(0))
    print(results)
    results = model(torch.LongTensor([0, 12, 23, 2]).to("cuda").unsqueeze(0))
    print(results)

model = torch.jit.script(model)

# run once without torch.no_grad()
results = model(torch.LongTensor([0, 12, 23, 2]).to("cuda").unsqueeze(0))
print(results)
results = model(torch.LongTensor([0, 12, 23, 2]).to("cuda").unsqueeze(0))
print(results)

print("##### SCRIPTED MODEL ######")
with torch.no_grad():
    results = model(torch.LongTensor([0, 12, 23, 2]).to("cuda").unsqueeze(0))
    print(results)
    results = model(torch.LongTensor([0, 12, 23, 2]).to("cuda").unsqueeze(0))
    print(results)

@erichan1 wondering if it has to do anything with BT, since torch.no_grad() would lead to fast path?

erichan1 commented 2 years ago

It's definitely possible that this is a BT bug. Thanks for flagging! Let me look into it.

pytorch / text

Scripted Roberta model stuck in second inference call #1849

🐛 Bug