(macOS) RuntimeError: Error invoking function: <vm>:0: UNKNOWN; device not supported in the compiled configuration

bsergean commented 1 year ago

What happened?

I am trying to run the colab example but it fails.

Steps to reproduce your issue

python3 -mvenv venv
source venv/bin/activate
pip install --pre torch-mlir -f https://llvm.github.io/torch-mlir/package-index/ --extra-index-url https://download.pytorch.org/whl/nightly/cpu
pip install -f https://iree-org.github.io/iree/pip-release-links.html iree-compiler iree-runtime
pip install git+https://github.com/iree-org/iree-torch.git
pip install transformers

Then I merged the 2 snippets of the colab notebook. (the class definition + the few invocations), calling it example.py.

sandbox$ cat example.py 
import torch
import torch_mlir
import iree_torch

from transformers import AutoTokenizer, AutoModelForSequenceClassification

def prepare_sentence_tokens(hf_model: str, sentence: str):
    tokenizer = AutoTokenizer.from_pretrained(hf_model)
    return torch.tensor([tokenizer.encode(sentence)])

class OnlyLogitsHuggingFaceModel(torch.nn.Module):
    """Wrapper that returns only the logits from a HuggingFace model."""

    def __init__(self, model_name: str):
        super().__init__()
        self.model = AutoModelForSequenceClassification.from_pretrained(
            model_name,  # The pretrained model name.
            # The number of output labels--2 for binary classification.
            num_labels=2,
            # Whether the model returns attentions weights.
            output_attentions=False,
            # Whether the model returns all hidden-states.
            output_hidden_states=False,
            torchscript=True,
        )
        self.model.eval()

    def forward(self, input):
        # Return only the logits.
        return self.model(input)[0]

# Suppress warnings
import warnings
warnings.simplefilter("ignore")
import os
os.environ["TOKENIZERS_PARALLELISM"] = "true"

# The HuggingFace model name to use
model_name = "philschmid/MiniLM-L6-H384-uncased-sst2"

# The sentence to run the model on
sentence = "The quick brown fox jumps over the lazy dog."

print("Parsing sentence tokens.")
example_input = prepare_sentence_tokens(model_name, sentence)

print("Instantiating model.")
model = OnlyLogitsHuggingFaceModel(model_name)

print("Compiling with Torch-MLIR")
linalg_on_tensors_mlir = torch_mlir.compile(
    model,
    example_input,
    output_type=torch_mlir.OutputType.LINALG_ON_TENSORS,
    use_tracing=True)

print("Compiling with IREE")
# Backend options:
#
# llvm-cpu - cpu, native code
# vmvx - cpu, interpreted
# vulkan - GPU for general GPU devices
# cuda - GPU for NVIDIA devices
iree_backend = "llvm-cpu" # it works fine with vmvx
iree_vmfb = iree_torch.compile_to_vmfb(linalg_on_tensors_mlir, iree_backend)

print("Loading in IREE")
invoker = iree_torch.load_vmfb(iree_vmfb, iree_backend)

print("Running on IREE")
result = invoker.forward(example_input)
print("RESULT:", result)

$ python3 example.py 
Parsing sentence tokens.
Instantiating model.
Compiling with Torch-MLIR
Compiling with IREE
Loading in IREE
Running on IREE
Traceback (most recent call last):
  File "/Users/benjamin.sergeant/sandbox/example.py", line 74, in <module>
    result = invoker.forward(example_input)
  File "/Users/benjamin.sergeant/sandbox/venv/lib/python3.10/site-packages/iree_torch/__init__.py", line 51, in invoke
    result = self._iree_module[function_name](*iree_args)
  File "/Users/benjamin.sergeant/sandbox/venv/lib/python3.10/site-packages/iree/runtime/function.py", line 130, in __call__
    self._invoke(arg_list, ret_list)
  File "/Users/benjamin.sergeant/sandbox/venv/lib/python3.10/site-packages/iree/runtime/function.py", line 154, in _invoke
    self._vm_context.invoke(self._vm_function, arg_list, ret_list)
RuntimeError: Error invoking function: <vm>:0: UNKNOWN; device not supported in the compiled configuration; 
[ 0] bytecode module.forward:1306 /Users/benjamin.sergeant/sandbox/venv/lib/python3.10/site-packages/transformers/models/bert/modeling_bert.py:236:0

What component(s) does this issue relate to?

Python

Version information

Python 3.10.9

And the modules are:

$ pip freeze certifi==2022.12.7 charset-normalizer==3.0.1 filelock==3.9.0 huggingface-hub==0.12.1 idna==3.4 iree-compiler==20230218.434 iree-runtime==20230218.434 iree-torch==0.0.1 mpmath==1.2.1 networkx==3.0 numpy==1.24.2 packaging==23.0 PyYAML==6.0 regex==2022.10.31 requests==2.28.2 sympy==1.11.1 tokenizers==0.13.2 torch==2.0.0.dev20230209 torch-mlir==20230210.745 tqdm==4.64.1 transformers==4.26.1 typing_extensions==4.5.0 urllib3==1.26.14

Additional context

No response

bsergean commented 1 year ago

(venv) sandbox$ python3 example.py 
Parsing sentence tokens.
Downloading (…)okenizer_config.json: 100%|██████████████████████████████████████████████████████████████████████████████████████| 512/512 [00:00<00:00, 485kB/s]
Downloading (…)lve/main/config.json: 100%|██████████████████████████████████████████████████████████████████████████████████████| 710/710 [00:00<00:00, 737kB/s]
Downloading (…)solve/main/vocab.txt: 100%|████████████████████████████████████████████████████████████████████████████████████| 232k/232k [00:00<00:00, 838kB/s]
Downloading (…)/main/tokenizer.json: 100%|███████████████████████████████████████████████████████████████████████████████████| 466k/466k [00:00<00:00, 1.32MB/s]
Downloading (…)cial_tokens_map.json: 100%|█████████████████████████████████████████████████████████████████████████████████████| 112/112 [00:00<00:00, 96.8kB/s]
Instantiating model.
Downloading (…)"pytorch_model.bin";: 100%|██████████████████████████████████████████████████████████████████████████████████| 90.9M/90.9M [00:00<00:00, 131MB/s]
Compiling with Torch-MLIR
Compiling with IREE
Loading in IREE
Running on IREE
RESULT: tensor([[ 1.8574, -1.8036]])

I tried again today, on linux (ubuntu 18.04), with python-3.11 and it worked fine.

It felt like the 'downloading steps' (before the 'Compiling with Torch-MLIR' steps), never happens on macOS.

bsergean commented 1 year ago

With the version that worked (on linux), the environment is:

(venv) sandbox$ pip freeze
certifi==2022.12.7
charset-normalizer==3.0.1
filelock==3.9.0
huggingface-hub==0.12.1
idna==3.4
iree-compiler==20230219.435
iree-runtime==20230219.435
iree-torch==0.0.1
mpmath==1.2.1
networkx==3.0
numpy==1.24.2
packaging==23.0
PyYAML==6.0
regex==2022.10.31
requests==2.28.2
sympy==1.11.1
tokenizers==0.13.2
torch==2.0.0.dev20230212+cpu
torch-mlir==20230219.754
tqdm==4.64.1
transformers==4.26.1
typing_extensions==4.5.0
urllib3==1.26.14

bsergean commented 1 year ago

Note that on macOS this is running on an M1 processor (arm), while on linux I'm running on an Intel processor.

antiagainst commented 1 year ago

Likely due to that the model was compiled for x86_64 but we are expecting arm64 on M1 architecture. Need to use the proper --iree-llvm-target-cpu-features=.

bsergean commented 1 year ago

The doc for the function that looks applicable has no llvm-target cpu options (there's a cuda option though).

There's a TODO in the doc that hint at the sample place.

Help on function compile_to_vmfb in module iree_torch:

compile_to_vmfb(mlir_module, target_backend='llvm-cpu', cuda_llvm_target_arch: Optional[str] = None)
    Compile an MLIR module to an IREE Flatbuffer.

    The module is expected to be in the format produced by `torch_mlir.compile`
    with `OutputType.LINALG_ON_TENSORS`.

    TODO: Expose more compiler options.

bsergean commented 1 year ago

Also everything works fine if I use vmvx for the iree_backend.

iree-org / iree