Segmentation fault on loading a tokenizer with torch.jit.load + map_location

mreso commented 2 years ago

🐛 Bug

Describe the bug When saving a scripted tokenizer with torch.jit.save and loading it with torch.jit.load while giving a gpu device as map_location the program crashed with a segmentation fault.

To Reproduce Steps to reproduce the behavior: Run:

import torch

import torchtext.transforms as T
from torch.hub import load_state_dict_from_url

text_transform = T.Sequential(
    T.SentencePieceTokenizer("https://download.pytorch.org/models/text/xlmr.sentencepiece.bpe.model"),
    T.VocabTransform(load_state_dict_from_url("https://download.pytorch.org/models/text/xlmr.vocab.pt")),
    T.Truncate(256 - 2),
    T.AddToken(token=0, begin=True),
    T.AddToken(token=2, begin=False),
)

text_transform_jit = torch.jit.script(text_transform)

torch.jit.save(text_transform_jit, "m.pt")

print("No mapping")
model_loaded = torch.jit.load("m.pt")

print("Map on cpu")
model_loaded = torch.jit.load("m.pt", map_location="cpu")
print("Map on gpu")
model_loaded = torch.jit.load("m.pt", map_location="cuda")

Expected behavior My personal expectation was that the tokenizer would get silently loaded on cpu as there is no gpu implementation. Another option would be an error message stating that the tokenizer cannot be mapped to a gpu.

Environment PyTorch version: 1.13.0.dev20220615 Is debug build: False CUDA used to build PyTorch: 11.3 ROCM used to build PyTorch: N/A

OS: Ubuntu 18.04.6 LTS (x86_64) GCC version: (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0 Clang version: Could not collect CMake version: version 3.22.4 Libc version: glibc-2.27

Python version: 3.10.4 (main, Mar 31 2022, 08:41:55) [GCC 7.5.0] (64-bit runtime) Python platform: Linux-5.4.0-1075-aws-x86_64-with-glibc2.27 Is CUDA available: True CUDA runtime version: Could not collect GPU models and configuration: GPU 0: Tesla V100-SXM2-16GB GPU 1: Tesla V100-SXM2-16GB GPU 2: Tesla V100-SXM2-16GB GPU 3: Tesla V100-SXM2-16GB

Nvidia driver version: 510.47.03 cuDNN version: Probably one of the following: /usr/local/cuda-11.1/targets/x86_64-linux/lib/libcudnn.so.8.0.5 /usr/local/cuda-11.1/targets/x86_64-linux/lib/libcudnn_adv_infer.so.8.0.5 /usr/local/cuda-11.1/targets/x86_64-linux/lib/libcudnn_adv_train.so.8.0.5 /usr/local/cuda-11.1/targets/x86_64-linux/lib/libcudnn_cnn_infer.so.8.0.5 /usr/local/cuda-11.1/targets/x86_64-linux/lib/libcudnn_cnn_train.so.8.0.5 /usr/local/cuda-11.1/targets/x86_64-linux/lib/libcudnn_ops_infer.so.8.0.5 /usr/local/cuda-11.1/targets/x86_64-linux/lib/libcudnn_ops_train.so.8.0.5 /usr/local/cuda-11.2/targets/x86_64-linux/lib/libcudnn.so.8.1.1 /usr/local/cuda-11.2/targets/x86_64-linux/lib/libcudnn_adv_infer.so.8.1.1 /usr/local/cuda-11.2/targets/x86_64-linux/lib/libcudnn_adv_train.so.8.1.1 /usr/local/cuda-11.2/targets/x86_64-linux/lib/libcudnn_cnn_infer.so.8.1.1 /usr/local/cuda-11.2/targets/x86_64-linux/lib/libcudnn_cnn_train.so.8.1.1 /usr/local/cuda-11.2/targets/x86_64-linux/lib/libcudnn_ops_infer.so.8.1.1 /usr/local/cuda-11.2/targets/x86_64-linux/lib/libcudnn_ops_train.so.8.1.1 HIP runtime version: N/A MIOpen runtime version: N/A Is XNNPACK available: True

Versions of relevant libraries: [pip3] numpy==1.22.3 [pip3] torch==1.13.0.dev20220615 [pip3] torchaudio==0.13.0.dev20220615 [pip3] torchtext==0.14.0.dev20220615 [pip3] torchvision==0.14.0.dev20220615 [conda] blas 1.0 mkl [conda] cudatoolkit 11.3.1 h2bc3f7f_2 [conda] mkl 2021.4.0 h06a4308_640 [conda] mkl-service 2.4.0 py310h7f8727e_0 [conda] mkl_fft 1.3.1 py310hd6ae3a3_0 [conda] mkl_random 1.2.2 py310h00e6091_0 [conda] numpy 1.22.3 py310hfa59a62_0 [conda] numpy-base 1.22.3 py310h9585f30_0 [conda] pytorch 1.13.0.dev20220615 py3.10_cuda11.3_cudnn8.3.2_0 pytorch-nightly [conda] pytorch-mutex 1.0 cuda pytorch-nightly [conda] torchaudio 0.13.0.dev20220615 py310_cu113 pytorch-nightly [conda] torchtext 0.14.0.dev20220615 py310 pytorch-nightly [conda] torchvision 0.14.0.dev20220615 py310_cu113 pytorch-nightly

mreso commented 2 years ago

Maybe as addition context: I am trying to set the map_location during loading because I want to combine tokenizer and model with a Sequential transformation to save them into the same file.

parmeet commented 2 years ago

Hi @mreso, i am not sure what should be the expected behavior when implementation is not available on GPU, but to me it's an undefined behavior. I wonder if we can do something explicitly to load scripted module on CPU if GPU implementation is not available?

@suo I wonder if you have recommendation on how to combine CPU based text pre-proc unit (tokenizer) with GPU based model into one scripted module?

mreso commented 2 years ago

Thanks @parmeet, that makes sense. From a user perspective I would prefer a more elaborate error message. Same when loading the tokenizer but torchtext is not installed. In this case I only get "Aborted (core dumped)". In Tensorflow for example you get a list of missing graph ops when an optional package is not installed but you're loading a graph that uses it.

mreso commented 2 years ago

Just noticed that I get "Aborted (core dumped)" even if torchtext is installed but I do not call "import torchtext" before executing torch.jit.load. Thats a tricky situation as in our use case (torchserve) we might not import torchtext explicitly in a something like a default request handler. Is it possible to through an exception in this case that we can handle instead of crashing the process?

Yet another piece of context on why I expected a silent load to cpu. Maybe it's my misunderstanding of the concept but calling tokenizer.to('cuda') on the scripted tokenizer works fine, though the tokenizer is not moved to the gpu as well.

pytorch / text

Segmentation fault on loading a tokenizer with torch.jit.load + map_location #1793

🐛 Bug