Compile error running pytorch-pretrained-bert

sjw36 commented 2 years ago

Using models from https://pypi.org/project/pytorch-pretrained-bert/

And running this script:

import torch
from pytorch_pretrained_bert import BertTokenizer, BertModel, BertForMaskedLM

import torch_mlir

import logging
logging.basicConfig(level=logging.INFO)

# Load pre-trained model tokenizer (vocabulary)
tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')

# Tokenized input
text = "[CLS] Who was Jim Henson ? [SEP] Jim Henson was a puppeteer [SEP]"
tokenized_text = tokenizer.tokenize(text)

# Mask a token that we will try to predict back with `BertForMaskedLM`
masked_index = 8
tokenized_text[masked_index] = '[MASK]'
assert tokenized_text == ['[CLS]', 'who', 'was', 'jim', 'henson', '?', '[SEP]', 'jim', '[MASK]', 'was', 'a', 'puppet', '##eer', '[SEP]']

# Convert token to vocabulary indices
indexed_tokens = tokenizer.convert_tokens_to_ids(tokenized_text)
# Define sentence A and B indices associated to 1st and 2nd sentences (see paper)
segments_ids = [0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1]

# Convert inputs to PyTorch tensors
tokens_tensor = torch.tensor([indexed_tokens])
segments_tensors = torch.tensor([segments_ids])
print("Tokens = ", tokens_tensor);
print("Segments = ", segments_tensors);

bert = BertModel.from_pretrained('bert-base-uncased')
bert.eval()
print("BERT", bert)

# Predict hidden states features for each layer
with torch.no_grad():
    encoded_layers, _ = bert(tokens_tensor, segments_tensors)
# We have a hidden states for each of the 12 layers in model bert-base-uncased
assert len(encoded_layers) == 12

module = torch_mlir.compile(bert, [tokens_tensor, segments_tensors], output_type=torch_mlir.OutputType.TORCH)

Gives error during torch_mlir.compile:

RuntimeError: cannot statically infer the expected size of a list in this context: File "/mnt/swaters/iwa/torch-mlir.0/mlir_venv/lib/python3.8/site-packages/pytorch_pretrained_bert/modeling.py", line 298 new_x_shape = x.size()[:-1] + (self.num_attention_heads, self.attention_head_size) print("New Shape: ", new_x_shape, self.num_attention_heads, self.attention_head_size) x = x.view(*new_x_shape)
        return x.permute(0, 2, 1, 3)
'BertSelfAttention.transpose_for_scores' is being compiled since it was called from 'BertSelfAttention.forward'
  File "/mnt/swaters/iwa/torch-mlir.0/mlir_venv/lib/python3.8/site-packages/pytorch_pretrained_bert/modeling.py", line 306
        mixed_value_layer = self.value(hidden_states)

        query_layer = self.transpose_for_scores(mixed_query_layer)
        ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ <--- HERE
        key_layer = self.transpose_for_scores(mixed_key_layer)
        value_layer = self.transpose_for_scores(mixed_value_layer)

powderluv commented 2 years ago

have you been able to try HF_Bert ? We use that for our tests. We will continue to debug this.

silvasean commented 2 years ago

Thanks for this.

I put up some PR's that chip away at this model. https://github.com/llvm/torch-mlir/pull/824 https://github.com/llvm/torch-mlir/pull/825

It looks like https://github.com/llvm/torch-mlir/pull/796 will also be needed for it. I will check back in on this once that lands.

sjw36 commented 2 years ago

have you been able to try HF_Bert ? We use that for our tests. We will continue to debug this.

I have some time now, will give that a go. Thanks.

YellowHCH commented 2 years ago

import torch
from transformers import BertTokenizer, BertModel

import torch_mlir

import logging
logging.basicConfig(level=logging.INFO)

# Load pre-trained model tokenizer (vocabulary)
tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')

# Tokenized input
text = "[CLS] Who was Jim Henson ? [SEP] Jim Henson was a puppeteer [SEP]"
tokenized_text = tokenizer.tokenize(text)

# Mask a token that we will try to predict back with `BertForMaskedLM`
masked_index = 8
tokenized_text[masked_index] = '[MASK]'
assert tokenized_text == ['[CLS]', 'who', 'was', 'jim', 'henson', '?', '[SEP]', 'jim', '[MASK]', 'was', 'a', 'puppet', '##eer', '[SEP]']

# Convert token to vocabulary indices
indexed_tokens = tokenizer.convert_tokens_to_ids(tokenized_text)
# Define sentence A and B indices associated to 1st and 2nd sentences (see paper)
segments_ids = [0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1]

# Convert inputs to PyTorch tensors
tokens_tensor = torch.tensor([indexed_tokens])
segments_tensors = torch.tensor([segments_ids])
print("Tokens = ", tokens_tensor);
print("Segments = ", segments_tensors);

bert = BertModel.from_pretrained('bert-base-uncased', return_dict=False)
bert.eval()
# print("BERT", bert)

# Predict hidden states features for each layer
with torch.no_grad():
    encoded_layers, _ = bert(tokens_tensor, segments_tensors)
# We have a hidden states for each of the 12 layers in model bert-base-uncased
#print("encoded_layers len = ", len(encoded_layers))
#assert len(encoded_layers) == 12

module = torch_mlir.compile(bert, [tokens_tensor, segments_tensors], output_type=torch_mlir.OutputType.TOSA, use_tracing=True)

It seems convert to torch ir success, but not work while convert to tosa.

Lowering Torch Backend IR -> TOSA Backend IR failed with the following diagnostics:
error: unsupported by backend lowering: tensor with unknown rank or dtype
note: see current operation: %790 = "torch.prim.TupleIndex"(%789, %221) : (!torch.tuple<tensor<[1,14,768],f32>, tensor<[1,768],f32>>, !torch.int) -> !torch.vtensor
note: this is likely due to a missing shape transfer function in shape_lib_gen.py

Error can be reproduced with:
$ torch-mlir-opt -pass-pipeline='torch-backend-to-tosa-backend-pipeline' /tmp/BertModel.mlir
Add '-print-ir-after-all -mlir-disable-threading' to get the IR dump for debugging purpose.

silvasean commented 2 years ago

It seems like the issue here is possibly related to multiple returns. Can you use a wrapper module that extracts the logits? See example here: https://github.com/google/iree-torch/blob/c3d7717ef4b9c83aa4870e949d9dee588e6e190d/examples/bert.py#L48

Do you need the other return values?

YellowHCH commented 2 years ago

It seems like the issue here is possibly related to multiple returns. Can you use a wrapper module that extracts the logits? See example here: https://github.com/google/iree-torch/blob/c3d7717ef4b9c83aa4870e949d9dee588e6e190d/examples/bert.py#L48

Do you need the other return values?

Thank you for your reply. I just need to convert pytorch model(e.g. bert, fastspeech2...) to TOSA. The example above can work well. I simply change LINALG_ON_TENSORS to TOSA at https://github.com/google/iree-torch/blob/c3d7717ef4b9c83aa4870e949d9dee588e6e190d/examples/bert.py#L95, and use mode_name="bert-base-cased". Gives error as below:


error: Integers with widths greater than 32 are not supported
note: see current operation: %435 = "torch.aten.add.Tensor"(%433, %405, %414) : (!torch.vtensor<[],si64>, !torch.vtensor<[],si64>, !torch.int) -> !torch.vtensor<[],si64>
error: failed to legalize operation 'torch.aten.add.Tensor' that was explicitly marked illegal
note: see current operation: %433 = "torch.aten.add.Tensor"(%432, %405, %414) : (!torch.vtensor<[],si64>, !torch.vtensor<[],si64>, !torch.int) -> !torch.vtensor<[],si64>

Error can be reproduced with:
$ torch-mlir-opt -pass-pipeline='torch-backend-to-tosa-backend-pipeline' /tmp/OnlyLogitsHuggingFaceModel.mlir
Add '-print-ir-after-all -mlir-disable-threading' to get the IR dump for debugging purpose.```

silvasean commented 2 years ago

@sjarus -- how have you folks been dealing with the i64's in the bert models?

sjarus commented 2 years ago

si64 should be acceptable to TOSA. It's not an I/O type but should permit accumulation. I'll check the dialect form; we may also have local fixes to get us around this. I'm in the process of getting these out this week.

YellowHCH commented 2 years ago

For bert model to Tosa, I added a couple of passes to fix the errors. The 'aten.slice.Tensor' conversion seems to be missing.

yinrun commented 1 year ago

have you been able to try HF_Bert ? We use that for our tests. We will continue to debug this.

Are there any documents to show how your tests are working based on HF_Bert? I am trying to use torch-mlir to compile a bert demo base on TOSA backend

llvm / torch-mlir

Compile error running pytorch-pretrained-bert #784