Can't validate `torch.aten::view` which seems legit

mfuntowicz commented 1 year ago

Hi 🙂,

I'm working on exporting transformers PyTorch based models to MLIR with dynamic shapes.

Unfortunately, while the static shape compilation from torch_mlir seems to works fine, when enabling dynamic shapes it fails saying it can't validate view operator's shape (see after).

torch_mlir.compiler_utils.TorchMlirCompilerError: Lowering Torch Backend IR -> Linalg-on-Tensors Backend IR failed with the following diagnostics:
error: failed to legalize operation 'torch.aten.view' that was explicitly marked illegal
note: see current operation: %614 = "torch.aten.view"(%612, %613) : (!torch.vtensor<[1,?,768],f32>, !torch.list<int>) -> !torch.vtensor<[1,?,12,64],f32>

With my current understanding, the view operator is only applied to the axis shape is known and 12 x 64 = 768 so it should be ok?

I'm including a simple repro script below:

import tempfile
import time
from typing import Optional, List

from torch import tensor
from torch.nn import Module
import torch_mlir
import iree_torch

from transformers import AutoTokenizer, AutoModelForSequenceClassification

def prepare_sentence_tokens(hf_model: str, sentence: str, dynamic_axes: Optional[List[int]] = None):
    tokenizer = AutoTokenizer.from_pretrained(hf_model)
    args = tensor([tokenizer.encode(sentence)])

    if dynamic_axes is not None:
        placeholder = torch_mlir.TensorPlaceholder.like(args, dynamic_axes=dynamic_axes)

        example_args = torch_mlir.ExampleArgs()
        example_args.add_method("forward", placeholder)
        return example_args
    else:
        return args

class OnlyLogitsHuggingFaceModel(Module):
    """Wrapper that returns only the logits from a HuggingFace model."""

    def __init__(self, model_name: str):
        super().__init__()
        self.model = AutoModelForSequenceClassification.from_pretrained(
            model_name,  # The pretrained model name.
            # The number of output labels--2 for binary classification.
            num_labels=2,
            # Whether the model returns attentions weights.
            output_attentions=False,
            # Whether the model returns all hidden-states.
            output_hidden_states=False,
            torchscript=True,
        )
        self.model.eval()

    def forward(self, input):
        # Return only the logits.
        return self.model(input)[0]

# Suppress warnings
import warnings
warnings.simplefilter("ignore")
import os
os.environ["TOKENIZERS_PARALLELISM"] = "true"

if __name__ == '__main__':
    # The HuggingFace model name to use
    model_name = "distilbert-base-uncased-finetuned-sst-2-english"

    # The sentence to run the model on
    sentence = "The quick brown fox jumps over the lazy dog."

    print("Parsing sentence tokens.")
    example_input = prepare_sentence_tokens(model_name, sentence, dynamic_axes=[1])

    print("Instantiating model.")
    model = OnlyLogitsHuggingFaceModel(model_name)

    print("Compiling with Torch-MLIR")
    linalg_on_tensors_mlir = torch_mlir.compile(
        model,
        example_input,
        output_type=torch_mlir.OutputType.LINALG_ON_TENSORS,
        use_tracing=True,
        ignore_traced_shapes=True,
        verbose=False
    )

    # print(linalg_on_tensors_mlir)
    with open(os.path.join(tempfile.gettempdir(), "minilm.mlir"), mode="w") as tmp:
        tmp.write(linalg_on_tensors_mlir.operation.get_asm(large_elements_limit=10, enable_debug_info=True))

    print("Compiling with IREE")
    # Backend options:
    #
    # llvm-cpu - cpu, native code
    # vmvx - cpu, interpreted
    # vulkan - GPU for general GPU devices
    # cuda - GPU for NVIDIA devices
    iree_backend = "llvm-cpu"
    iree_vmfb = iree_torch.compile_to_vmfb(linalg_on_tensors_mlir, iree_backend)

    print("Loading in IREE")
    invoker = iree_torch.load_vmfb(iree_vmfb, iree_backend)

    print("Running on IREE")

    example_input = prepare_sentence_tokens(model_name, sentence)
    for _ in range(100):
        start = time.time_ns()
        result = invoker.forward(example_input)
        end = time.time_ns()

        print(f"Forward took: {(end - start) / 1000 / 1000} ms")
    print("Done")

silvasean commented 1 year ago

@JakopinA can you take a look?

mfuntowicz commented 1 year ago

Adding I have included the "simpler" repro with dynamic shape only on the sequence axis ([1]), but I also tried with both batch + sequence ([0, 1]).

Output is the same error (just the leading batch axis being marqued dynamic in the error msg).

torch_mlir.compiler_utils.TorchMlirCompilerError: Lowering Torch Backend IR -> Linalg-on-Tensors Backend IR failed with the following diagnostics:
error: failed to legalize operation 'torch.aten.view' that was explicitly marked illegal
note: see current operation: %703 = "torch.aten.view"(%701, %702) : (!torch.vtensor<[?,?,768],f32>, !torch.list<int>) -> !torch.vtensor<[?,?,12,64],f32>

powderluv commented 1 year ago

@gpetters94 is taking a look at it.

gpetters94 commented 1 year ago

It looks like right now we don't support 1 unknown input to 1 unknown output. I think it should be a quick fix - if the known sizes multiply to the same result then the single unknown is the same so it should just be a matter of adding that check in.

mfuntowicz commented 1 year ago

Thanks @gpetters94, is it something that I would be able to contribute?

If so, would you point me the right place in the codebase where I should look at? I'm not so familiar with the repo structure so far.

gpetters94 commented 1 year ago

Thanks @gpetters94, is it something that I would be able to contribute?

If so, would you point me the right place in the codebase where I should look at? I'm not so familiar with the repo structure so far.

Sure thing, this is the function where the the code would go, probably somewhere around here. Let me know if you need any help with the implementation - I'm always glad to help people getting into the project!

llvm / torch-mlir

Can't validate `torch.aten::view` which seems legit #1698