Segmentation fault (core dumped) when conversion of GPT-J to onnx

Environment info

transformers version: 4.14.1

PyTorch version:

nvidia-dlprof-pytorch-nvtx    1.7.0
pytorch-quantization          2.1.2
torch                         1.11.0a0+b6df043
torch-tensorrt                1.0.0a0
torchtext                     0.12.0a0
torchvision                   0.11.0a0

onnx: 1.10.1
Platform: GCP A100 Instance
NVIDIA Driver Version: 495.44
Docker Image: nvcr.io/nvidia/pytorch:21.11-py3
Docker version: 20.10.12, build e91ed57

Information

I want to convert GPT-J model(https://huggingface.co/NovelAI/genji-jp) to onnx file, but I have a trouble with the conversion by using the following scripts.

from transformers import AutoTokenizer, AutoModelForCausalLM, AutoConfig
import torch
torch.device("cuda", index=0)
torch.set_default_tensor_type('torch.cuda.HalfTensor')

from transformers.onnx import OnnxConfig, export
from typing import Any, List, Mapping, Optional
from transformers import TensorType, LayoutLMv2Processor, PreTrainedTokenizer
from collections import OrderedDict
from pathlib import Path

MAX_MODEL_INPUT = 256

config = AutoConfig.from_pretrained("NovelAI/genji-jp")

class GPTJOnnxConfig(OnnxConfig):

    @property
    def inputs(self) -> Mapping[str, Mapping[int, str]]:
        return OrderedDict(
            [
                ("input_ids", {0: "batch", 1: "sequence"}),
                ("attention_mask", {0: "batch", 1: "sequence"}),
                ("token_type_ids", {0: "batch", 1: "sequence"}),
            ]
        )

    @property
    def outputs(self) -> Mapping[str, Mapping[int, str]]:
        return OrderedDict([("last_hidden_state", {0: "batch", 1: "sequence"}), ("pooler_output", {0: "batch"})])

onnx_config = GPTJOnnxConfig(config)
tokenizer = AutoTokenizer.from_pretrained("EleutherAI/gpt-j-6B")
model = AutoModelForCausalLM.from_pretrained(
    "NovelAI/genji-jp", torch_dtype=torch.float16, low_cpu_mem_usage=True
).eval().cuda()

export(
    tokenizer=tokenizer, 
    model=model, 
    config=onnx_config,
    opset=13,
    output=Path.cwd() / "outputs",
)

To reproduce

Steps to reproduce the behavior:

run docker image nvcr.io/nvidia/pytorch:21.11-py3 as

docker run --rm -it --gpus all \ --ipc=host --ulimit memlock=-1 --ulimit stack=67108864 \ -v $(curr_dir):/mashim \ nvcr.io/nvidia/pytorch:$(container_version)-py3 \ bash
install transformer==4.14.1
run the above scripts

Log is as follows:

Downloading: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 836/836 [00:00<00:00, 874kB/s]
{'input_ids': {0: 'batch', 1: 'sequence'}, 'attention_mask': {0: 'batch', 1: 'sequence'}, 'token_type_ids': {0: 'batch', 1: 'sequence'}}
Downloading: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 619/619 [00:00<00:00, 629kB/s]
Downloading: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 779k/779k [00:00<00:00, 2.13MB/s]
Downloading: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 446k/446k [00:00<00:00, 1.79MB/s]
Downloading: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1.31M/1.31M [00:00<00:00, 3.55MB/s]
Downloading: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3.94k/3.94k [00:00<00:00, 3.90MB/s]
Downloading: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 357/357 [00:00<00:00, 354kB/s]
{'input_ids': [[50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256], [50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256]], 'attention_mask': [[1, 1, 1, 1, 1, 1, 1, 1], [1, 1, 1, 1, 1, 1, 1, 1]]}
Downloading: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 11.3G/11.3G [02:29<00:00, 81.1MB/s]
/opt/conda/lib/python3.8/site-packages/torch/onnx/utils.py:117: UserWarning: 'enable_onnx_checker' is deprecated and ignored. It will be removed in the next PyTorch release. To proceed despite ONNX checker failures, catch torch.onnx.CheckerError.
  warnings.warn("'enable_onnx_checker' is deprecated and ignored. It will be removed in "
/opt/conda/lib/python3.8/site-packages/torch/onnx/utils.py:130: UserWarning: `use_external_data_format' is deprecated and ignored. Will be removed in next PyTorch release. The code will work as it is False if models are not larger than 2GB, Otherwise set to False because of size limits imposed by Protocol Buffers.
  warnings.warn("`use_external_data_format' is deprecated and ignored. Will be removed in next "
/opt/conda/lib/python3.8/site-packages/transformers/models/gptj/modeling_gptj.py:558: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  assert batch_size > 0, "batch_size has to be defined and > 0"
Segmentation fault (core dumped)

Expected behavior

onnx file is generated successfully.

I figured out the above reason. While ONNXconfig file is wrong, I explicit use of GPU, leading to segmentation fault. Correct config is here

class GPTJOnnxConfig(OnnxConfig):
    @property
    def inputs(self) -> Mapping[str, Mapping[int, str]]:
        return OrderedDict(
            [
                ("input_ids", {0: "batch", 1: "sequence"}),
            ]
        )

    @property
    def outputs(self) -> Mapping[str, Mapping[int, str]]:
        return OrderedDict(
            [
                ("last_hidden_state", {0: "batch", 1: "sequence"}), 
            ]
        )

After modification and use of CPU I can pass the error, but many files are generated... How can I fix?

root@8909446e3559:/mashim# ls outputs/
14164  14316  14443  14595  14722  14874  15001                             transformer.h.12.ln_1.weight      transformer.h.19.ln_1.bias        transformer.h.24.mlp.fc_out.bias  transformer.h.6.mlp.fc_in.bias
14165  14317  14444  14596  14723  14875  15002                             transformer.h.12.mlp.fc_in.bias   transformer.h.19.ln_1.weight      transformer.h.25.ln_1.bias        transformer.h.6.mlp.fc_out.bias
14166  14318  14445  14597  14724  14876  15003                             transformer.h.12.mlp.fc_out.bias  transformer.h.19.mlp.fc_in.bias   transformer.h.25.ln_1.weight      transformer.h.7.ln_1.bias
14192  14319  14471  14598  14750  14877  15029                             transformer.h.13.ln_1.bias        transformer.h.19.mlp.fc_out.bias  transformer.h.25.mlp.fc_in.bias   transformer.h.7.ln_1.weight
14193  14320  14472  14599  14751  14878  15030                             transformer.h.13.ln_1.weight      transformer.h.2.ln_1.bias         transformer.h.25.mlp.fc_out.bias  transformer.h.7.mlp.fc_in.bias
14194  14321  14473  14600  14752  14879  15031                             transformer.h.13.mlp.fc_in.bias   transformer.h.2.ln_1.weight       transformer.h.26.ln_1.bias        transformer.h.7.mlp.fc_out.bias
14195  14347  14474  14626  14753  14905  15032                             transformer.h.13.mlp.fc_out.bias  transformer.h.2.mlp.fc_in.bias    transformer.h.26.ln_1.weight      transformer.h.8.ln_1.bias
14196  14348  14475  14627  14754  14906  gpt-j.onnx                        transformer.h.14.ln_1.bias        transformer.h.2.mlp.fc_out.bias   transformer.h.26.mlp.fc_in.bias   transformer.h.8.ln_1.weight
14197  14349  14476  14628  14755  14907  lm_head.bias                      transformer.h.14.ln_1.weight      transformer.h.20.ln_1.bias        transformer.h.26.mlp.fc_out.bias  transformer.h.8.mlp.fc_in.bias
14223  14350  14502  14629  14781  14908  transformer.h.0.attn.bias         transformer.h.14.mlp.fc_in.bias   transformer.h.20.ln_1.weight      transformer.h.27.ln_1.bias        transformer.h.8.mlp.fc_out.bias
14224  14351  14503  14630  14782  14909  transformer.h.0.ln_1.bias         transformer.h.14.mlp.fc_out.bias  transformer.h.20.mlp.fc_in.bias   transformer.h.27.ln_1.weight      transformer.h.9.ln_1.bias
14225  14352  14504  14631  14783  14910  transformer.h.0.ln_1.weight       transformer.h.15.ln_1.bias        transformer.h.20.mlp.fc_out.bias  transformer.h.27.mlp.fc_in.bias   transformer.h.9.ln_1.weight
14226  14378  14505  14657  14784  14936  transformer.h.0.mlp.fc_in.bias    transformer.h.15.ln_1.weight      transformer.h.21.ln_1.bias        transformer.h.27.mlp.fc_out.bias  transformer.h.9.mlp.fc_in.bias
14227  14379  14506  14658  14785  14937  transformer.h.0.mlp.fc_out.bias   transformer.h.15.mlp.fc_in.bias   transformer.h.21.ln_1.weight      transformer.h.3.ln_1.bias         transformer.h.9.mlp.fc_out.bias
14228  14380  14507  14659  14786  14938  transformer.h.1.ln_1.bias         transformer.h.15.mlp.fc_out.bias  transformer.h.21.mlp.fc_in.bias   transformer.h.3.ln_1.weight       transformer.ln_f.bias
14254  14381  14533  14660  14812  14939  transformer.h.1.ln_1.weight       transformer.h.16.ln_1.bias        transformer.h.21.mlp.fc_out.bias  transformer.h.3.mlp.fc_in.bias    transformer.ln_f.weight
14255  14382  14534  14661  14813  14940  transformer.h.1.mlp.fc_in.bias    transformer.h.16.ln_1.weight      transformer.h.22.ln_1.bias        transformer.h.3.mlp.fc_out.bias   transformer.wte.weight
14256  14383  14535  14662  14814  14941  transformer.h.1.mlp.fc_out.bias   transformer.h.16.mlp.fc_in.bias   transformer.h.22.ln_1.weight      transformer.h.4.ln_1.bias
14257  14409  14536  14688  14815  14967  transformer.h.10.ln_1.bias        transformer.h.16.mlp.fc_out.bias  transformer.h.22.mlp.fc_in.bias   transformer.h.4.ln_1.weight
14258  14410  14537  14689  14816  14968  transformer.h.10.ln_1.weight      transformer.h.17.ln_1.bias        transformer.h.22.mlp.fc_out.bias  transformer.h.4.mlp.fc_in.bias
14259  14411  14538  14690  14817  14969  transformer.h.10.mlp.fc_in.bias   transformer.h.17.ln_1.weight      transformer.h.23.ln_1.bias        transformer.h.4.mlp.fc_out.bias
14285  14412  14564  14691  14843  14970  transformer.h.10.mlp.fc_out.bias  transformer.h.17.mlp.fc_in.bias   transformer.h.23.ln_1.weight      transformer.h.5.ln_1.bias
14286  14413  14565  14692  14844  14971  transformer.h.11.ln_1.bias        transformer.h.17.mlp.fc_out.bias  transformer.h.23.mlp.fc_in.bias   transformer.h.5.ln_1.weight
14287  14414  14566  14693  14845  14972  transformer.h.11.ln_1.weight      transformer.h.18.ln_1.bias        transformer.h.23.mlp.fc_out.bias  transformer.h.5.mlp.fc_in.bias
14288  14440  14567  14719  14846  14998  transformer.h.11.mlp.fc_in.bias   transformer.h.18.ln_1.weight      transformer.h.24.ln_1.bias        transformer.h.5.mlp.fc_out.bias
14289  14441  14568  14720  14847  14999  transformer.h.11.mlp.fc_out.bias  transformer.h.18.mlp.fc_in.bias   transformer.h.24.ln_1.weight      transformer.h.6.ln_1.bias
14290  14442  14569  14721  14848  15000  transformer.h.12.ln_1.bias        transformer.h.18.mlp.fc_out.bias  transformer.h.24.mlp.fc_in.bias   transformer.h.6.ln_1.weight

Thanks for raising the error! cc @michaelbenayoun and @lewtun for knowledge

I was able to reproduce this behaviour but am not entirely sure (yet) what is causing the ONNX export to generate so many files (there should only be a single .onnx file in the output).

My current best guess is that it is something peculiar with the torch.onnx.export function that we call internally in transformers.onnx. One possibility is that the sheer size of the model is causing a problem with protocol buffers (until recently it was only possible to export 2GB-sized models). Some more investigation is needed to figure this out, and I'll report back here when I have a better insight.

Incidentally, @shimoshida were you able to use your gpt-j.onnx model in ONNX Runtime? I'm curious whether the extra files are harmless or signal a deeper problem with the export.

@lewtun Thank you for your reply. I also try to call torch.onnx.export function directly, but the result is the same as the above one. The script is here.

``` from transformers import AutoTokenizer, AutoModelForCausalLM, AutoConfig import torch torch.device('cpu') torch.set_default_tensor_type('torch.FloatTensor') import transformers from transformers.onnx import OnnxConfig, export from typing import Any, List, Mapping, Optional from transformers import TensorType, LayoutLMv2Processor, PreTrainedTokenizer from collections import OrderedDict from pathlib import Path dir_path = Path.cwd() / "outputs" dir_path.mkdir(exist_ok=True) MAX_MODEL_INPUT = 256 config = AutoConfig.from_pretrained("NovelAI/genji-jp") model = AutoModelForCausalLM.from_pretrained( "NovelAI/genji-jp", low_cpu_mem_usage=True ).eval() model.float() tokenizer = AutoTokenizer.from_pretrained("EleutherAI/gpt-j-6B") class GPTJOnnxConfig(OnnxConfig): @property def inputs(self) -> Mapping[str, Mapping[int, str]]: return OrderedDict( [ ("input_ids", {0: "batch", 1: "sequence"}), ] ) @property def outputs(self) -> Mapping[str, Mapping[int, str]]: return OrderedDict( [ ("last_hidden_state", {0: "batch", 1: "sequence"}), ] ) onnx_config = GPTJOnnxConfig(config) dummy_inputs = onnx_config.generate_dummy_inputs(tokenizer) dummy_inputs = torch.Tensor(dummy_inputs["input_ids"]) dummy_inputs = dummy_inputs.to(torch.int64) input_names = list(onnx_config.inputs.keys()) output_names = list(onnx_config.outputs.keys()) with torch.no_grad(): outputs = model(dummy_inputs) dynamic_axes = { input_names[0]: {0: 'batch_size', 1: 'seq_len'}, output_names[0]: {0: 'batch_size', 1: 'seq_len'}, } torch.onnx.export( model, dummy_inputs, str(dir_path / "gpt-j.onnx"), input_names=input_names, output_names=output_names, example_outputs=outputs, dynamic_axes=dynamic_axes, opset_version=13, do_constant_folding=True, verbose=True ) ```

I have tested loading gpt-j.onnx by using ONNXRuntime, and then the following error is obtained:

Traceback (most recent call last):
  File "runtime_test.py", line 5, in <module>
    ort_sess = ort.InferenceSession('outputs/gpt-j.onnx')
  File "/opt/conda/lib/python3.8/site-packages/onnxruntime/capi/onnxruntime_inference_collection.py", line 335, in __init__
    self._create_inference_session(providers, provider_options, disabled_optimizers)
  File "/opt/conda/lib/python3.8/site-packages/onnxruntime/capi/onnxruntime_inference_collection.py", line 368, in _create_inference_session
    sess = C.InferenceSession(session_options, self._model_path, True, self._read_config_from_model)
onnxruntime.capi.onnxruntime_pybind11_state.Fail: [ONNXRuntimeError] : 1 : FAIL : Load model from outputs/gpt-j.onnx failed:Type Error: Type parameter (T) of Optype (Einsum) bound to different types (tensor(int64) and tensor(float) in node (Einsum_110).

One possibility is that the sheer size of the model is causing a problem with protocol buffers (until recently it was only possible to export 2GB-sized models)

Oh, I didn't know about that limitation until now... If so, I should raise this issue in the torch repository.

Thanks for testing the model with ONNX Runtime @shimoshida!

Oh, I didn't know about that limitation until now... If so, I should raise this issue in the torch repository. I think the limitation is actually on the onnx side (which is used by torch.onnx). For example, here's an issue where someone tries to export a >2GB sized model.

I tracked down the onnx PR where support for large models was introduced, and one can see the potentially relevant comment:

We need a method for optionally storing tensor data in separate files, which can be loaded on demand.

So my current understanding is that the multiple file export is expected for models like GPT-J, but that raises the question on how this data should be ingested in ONNX Runtime. I'll take another look at this and report back!

Hi @shimoshida here's a summary of what I think is going on:

The additional files created during the export are expected after all. In the torch.onnx.export() function (docs) you can see there's a use_external_data_format argument. This argument is True for GPT-J when using the transformers.onnx package, as you can see here.

On the ONNX side, I'm able to load the model and also check that it was exported correctly via

import onnx

# Check we can load the model
onnx_model = onnx.load('model.onnx')
# Check the model
onnx.checker.check_model('model.onnx', full_check=True)

I was able to reproduce your error with loading the ONNX model in ONNX Runtime. I saw a similar issue was raised in the ONNX Runtime, and a solution was suggested here. I haven't tried this yet, but it might work for your case. It doesn't seem to be connected to the choice of opset, since Einsum has been available since opset=12

I suggest opening an issue on the ONNX Runtime repo and see whether they can provide some further advice.

@lewtun Thank you for sharing information !

I suggest opening an issue on the ONNX Runtime repo and see whether they can provide some further advice.

Sure. I've asked a question and will wait for an answer. https://github.com/microsoft/onnxruntime/discussions/10121

Hi @shimoshida it seems that the root cause of the problem was due to a mismatch in the einsum types: https://github.com/microsoft/onnxruntime/discussions/10121#discussioncomment-1948951

Does that proposal solve the issue for you?

@lewtun I'm sorry for the late reply. I have tested using the proposal, but I have encountered the following problem: https://github.com/microsoft/onnxruntime/discussions/10121#discussioncomment-1987845

However, the problem seems not relevant to transformers, I have closed this issue. Thank you for your help!

Thank you for the reply @shimoshida ! It looks like a mismatch between the ops in the original and traced models at runtime, but you're right that the ONNX export itself seems to be OK.

huggingface / transformers