sonos / tract

Tiny, no-nonsense, self-contained, Tensorflow and ONNX inference
Other
2.23k stars 214 forks source link

Multiple variable dimensions #533

Closed guillaume-be closed 2 years ago

guillaume-be commented 3 years ago

Hello,

I am trying looking into Tract as an ONNX runtime for language models - with the goal of eventually integrating it to https://github.com/guillaume-be/rust-bert. I have exported a BERT-like model using the new utilities offered by transformers.onnx, and I am able to run the model without issue using onnxruntime in Python.

I am trying to load the model using Tract (which works), but unfortunately fails when running it on a selected input:

Error: Evaluating #1 "attention_mask" Source: output 0, expected N,S,I64, got 2,12,I64 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0...
error: process didn't exit successfully: `target\debug\examples\onnx.exe` (exit code: 1)

where attention_mask is the second input. This is my first try at using the Tract API - am I missing something?

use std::path::Path;
use tract_onnx::prelude::*;

fn main() -> anyhow::Result<()> {
    let model_path =
        Path::new("E:/Coding/distilbert-base-uncased-finetuned-sst-2-english/model.onnx");

    let batch_symbol = Symbol::new('N');
    let sequence_symbol = Symbol::new('S');

    let model = tract_onnx::onnx()
        .model_for_path(model_path)?
        .with_input_fact(
            0,
            InferenceFact::dt_shape(
                i64::datum_type(),
                tvec!(batch_symbol.to_dim(), sequence_symbol.to_dim()),
            ),
        )?
        .with_input_fact(
            1,
            InferenceFact::dt_shape(
                i64::datum_type(),
                tvec!(batch_symbol.to_dim(), sequence_symbol.to_dim()),
            ),
        )?
        .into_runnable()?;

    let input_ids: Tensor = tract_ndarray::array![
        [
            101i64, 2023i64, 2001i64, 1037i64, 2307i64, 3185i64, 102i64, 0i64, 0i64, 0i64, 0i64,
            0i64
        ],
        [
            101i64, 1045i64, 2001i64, 2061i64, 11471i64, 1045i64, 3062i64, 6680i64, 8576i64,
            1011i64, 2083i64, 102i64
        ]
    ]
    .into();

    let attention_mask: Tensor = tract_ndarray::array![
        [1i64, 1i64, 1i64, 1i64, 1i64, 1i64, 1i64, 0i64, 0i64, 0i64, 0i64, 0i64],
        [1i64, 1i64, 1i64, 1i64, 1i64, 1i64, 1i64, 1i64, 1i64, 1i64, 1i64, 1i64]
    ]
    .into();

    let result = model.run(tvec!(input_ids, attention_mask))?;
    println!("{:?}", result);

    Ok(())
}
kali commented 3 years ago

Haha, it looks like a bug in the multiple inputs and multiple symbols corner :)

Could you give me your .onnx model so I can try to find our what's wrong ?

guillaume-be commented 3 years ago

Hello,

Thank you for the prompt response. I have uploaded the model at https://drive.google.com/file/d/1-eolYFHieS3v7JAC_dy_n-z1dPH2m0qc/view?usp=sharing (This was converted to ONNX using the weights shared under Apache 2.0 license at https://huggingface.co/distilbert-base-uncased-finetuned-sst-2-english)

kali commented 2 years ago

Hello, a few words to tell you I have not forget this issue. I had good hope I was close to make it work with some fixes done for another model, but I'm afraid it's not so. This bert does relatively complicated things around shapes computation and pushes tract shape predictions to their limit... I need to think about how to go this extra mile, whether I can push the current code just enough to handle your model (and the ones doing the same kind of things) or if i need a bigger refactoring of the shape prediction... It's been a while, so if you've moved on and don't care anymore, I will not be offended :)

guillaume-be commented 2 years ago

Hello @kali and thank you very much for the update. I understand the difficulty of running transformers models with ONNX. There seems to have been one successful attempt at running these models on Rust, although relying on onnxruntime: https://github.com/haixuanTao/onnxruntime-rs. Maybe this provides some hints as to what may help running the same.

I am still very much interested in seeing such capabilities in Tract. This library seems to be one of the best maintained for ONNX inference in the Rust ecosystem - and I would like to make it the platform of choice for the ONNX capabilities of the library I am working on. I have seen very promising speed-ups from the implementation of the post-processing pipeline of NLP models in Rust using Pytorch bindings (see here if interested). I expect the performance benefits offered by ONNX would synergize well with these improvements for high performance text generation.

kali commented 2 years ago

Thanks! Happy to see you're not giving up on us. And I'm not giving up, we'll get there... eventually.

kali commented 2 years ago

I think #689 may be it :)

guillaume-be commented 2 years ago

Hello @kali ,

Thank you for looking into this and proposing this fix, and apologies for not getting back to you earlier. I have tested the code given in this issue again and I am still facing the same error - is this an issue on my end?

kali commented 2 years ago

Damn. Just had a quick look at your code, can you try calling into_optimized() before into_runnable() ? Because I'm pretty sure I checked it was working. If that's not it, I'll re-setup the test case.

guillaume-be commented 2 years ago

This may be because I generated the model again using updated utilities from Huggingface: python -m transformers.onnx --model=distilbert-base-uncased-finetuned-sst-2-english --feature=sequence-classification distilbert-sst-onnx

I then created and tested an optimized version of this model with

optimized_model = optimizer.optimize_model(path_to_model, model_type='bert', num_heads=12, hidden_size=768)
optimized_model.save_model_to_file(path_to_optimized_model)

I have uploaded both files for your reference: non-optimized and optimized

  1. For the non-optimized model: I just tried adding into_optimized(), and now run in the following issue: Error: Translating node #43 "Slice_6" StridedSlice ToTypedTranslator. Note that the optimization seems to be much slower than with the Python onnxruntime library so I am not sure the operations are equivalent.

  2. For the optimized version:

    • skipping into_optimized leads to the same error described in this issue -- does this mean that when using tract I would need to re-optimized at each model load, and won't be able to load model optimized with onnxruntime?
    • adding into_optimized fails with:
      
      Error: Failed analyse for node #95 "EmbedLayerNormalization_0" Unimplemented(EmbedLayerNormalization)

Caused by: Wrong nnumber of outputs. Op says 1, node says 2.

kali commented 2 years ago

Hey, we have a Albert example here that worked a few months ago. https://github.com/sonos/tract/tree/main/examples/pytorch-albert-v2

EmbedLayerNormalization ? I can find it in ONNX operators list (see https://github.com/onnx/onnx/blob/main/docs/Operators.md ) , so I'm not sure what is happening there.

guillaume-be commented 2 years ago

Hello @kali ,

I believe this may be because the optimization is done using onnxruntime (see https://github.com/microsoft/onnxruntime/blob/master/docs/ContribOperators.md#com.microsoft.EmbedLayerNormalization). Changing the optimization level before serializing the model, I now see both the optimized and non-optimized models exhibiting the same behaviour:

Error: Translating node #43 "Slice_6" StridedSlice ToTypedTranslator

I cannot find this StridedSlice operator in neither the ONNX operators nor the onnxruntime custom operators.

kali commented 2 years ago

Ok, so it looks like Microsoft is doing the old IE CSS trick again, adding stuff to a standard as a way to lock people in. That's just great. And there are quite a bunch of them too. Obviously I can not support all these com.microsoft operators right away, so please no optimizing using onnxruntime for now.

The StridedSlice is basically the Slice operator from onnx (the name comes from tensorflow). I just reproduced the problem, I'm having a look.