ColBERT ONNX dimension error

lukasschmit commented 2 years ago

Describe the bug Hi, I am trying to get a custom ColBERT model exported to onnx to be used as the query encoder inside Vespa, but am running into some dimension errors it seems when trying to deploy:

WARNING: invalid rank feature 'onnxModel(colbert_encoder).contextual': onnx model dry-run failed: Non-zero status code returned while running Div node. Name:'Div_1186' Status Message: /root/rpmbuild/BUILD/vespa-onnxruntime-1.7.1/onnxruntime/core/framework/execution_frame.cc:67 onnxruntime::common::Status onnxruntime::IExecutionFrame::GetOrCreateNodeOutputMLValue(int, const onnxruntime::TensorShape*, OrtValue*&, size_t) shape && tensor.Shape() == *shape was false. OrtValue shape verification failed. Current shape:{1,1,1} Requested shape:{1,32,128}\n
ERROR: rank feature verification failed: onnxModel(colbert_encoder).contextual (summary features)
ERROR: rank profile 'colbert_query_encoder': FAIL

I am following the notebook provided here, with the only difference being that I am using a bert-base as my backbone model and loading the model from a .dnn (I have an older fork of ColBERT). When running the onnx file locally, it appears to work perfectly with the correct 1x32x128 dimensions for the contextual output. I have tried with and without the quantization step but get the same result.

To Reproduce

I have attached a minimal example to run. It relies on a .dnn file which I know to work. The onnx file also runs locally.

Expected behavior I expect vespa to be able to deploy this model given that locally the dimensions seem correct.

Environment (please complete the following information):

OS (for exporting): macOS Monterey 12.1
Infrastructure: Vespa Cloud
Versions I am on the latest versions of transformers/onnx/torch. I think it is very likely that versions are the issue

Vespa version 7.538.1

Additional context For additional context, our team ran this same script about 6-7 months ago and have had a working onnx colbert inside vespa. When redeploying this older file, it works perfectly, but vespa does not like the new one. However, the code to export has not changed, and when running both onnx files locally the both produce the same tensor dimensions.

This leads me to believe that there is some very nuanced difference in dependencies that created

minimal.py.zip

jobergum commented 2 years ago

Yes, this is odd, but as you have already identified, it's related to torch version.

It seem like the torch tracing of the forward pass has changed causing the output dimensions becoming dynamic even if the input is fixed length

Pinning torch to 1.6 it works, but from 1.7.0 and up it breaks.

pip3 install torch==1.6.0 numpy transformers onnx onnxruntime

Produces the correct output with batch and 32,32 (in your case it should be batch,32,128)

import onnx
g = onnx.load("vespa-colMiniLM-L-6.onnx")
g.graph.output

[name: "contextual"
type {
  tensor_type {
    elem_type: 1
    shape {
      dim {
        dim_param: "batch"
      }
      dim {
        dim_value: 32
      }
      dim {
        dim_value: 32
      }
    }
  }
}
]

While torch==1.10 (latest stable) produces the following output:

[name: "contextual"
type {
  tensor_type {
    elem_type: 1
    shape {
      dim {
        dim_param: "batch"
      }
      dim {
        dim_param: "Divcontextual_dim_1"
      }
      dim {
        dim_param: "Divcontextual_dim_2"
      }
    }
  }
}
]

As a work around you can try with 1.6.0, and I'll try to see how we can change the forward pass to avoid this, it's likely related to tracing of torch.nn.functional.normalize

jobergum commented 2 years ago

The onnx model exported with recent torch version works fine with stateless evaluation which is used for for both query encoders. I recommend upgrading and moving query encoding inference to the stateless container layer as

You avoid round trips to the content nodes
You can tune intra threads to lower latency of the query encoders

See https://blog.vespa.ai/stateless-model-evaluation/ and https://blog.vespa.ai/ml-model-serving-at-scale/

The problem seem to be that the ranking setup validation for evaluation on the content nodes does not handle the onnx model exported by torch 1.10:

WARNING: invalid rank feature 'onnxModel(colbert_encoder).contextual': onnx model dry-run failed: Non-zero status code returned while running Div node. Name:'Div_1186' Status Message: /root/rpmbuild/BUILD/vespa-onnxruntime-1.7.1/onnxruntime/core/framework/execution_frame.cc:67 onnxruntime::common::Status onnxruntime::IExecutionFrame::GetOrCreateNodeOutputMLValue(int, const onnxruntime::TensorShape*, OrtValue*&, size_t) shape && tensor.Shape() == *shape was false. OrtValue shape verification failed. Current shape:{1,1,1} Requested shape:{1,32,128}\n
ERROR: rank feature verification failed: onnxModel(colbert_encoder).contextual (summary features)
ERROR: rank profile 'colbert_query_encoder': FAIL

This with using a previous version of the passage ranking app where the query was encoded by content node:

 rank-profile colbert_query_encoder {
      num-threads-per-search: 1
      first-phase {
        expression: random 
      }
      summary-features {
        onnxModel(encoder).contextual
      }
    }

Do you have an idea about that @lesters?

lesters commented 2 years ago

The reason this happens is that the backend running on the content nodes do not support unbound dimension sizes. The output here from PyTorch 1.10 is:

    shape {
      dim {
        dim_param: "batch"
      }
      dim {
        dim_param: "Divcontextual_dim_1"
      }
      dim {
        dim_param: "Divcontextual_dim_2"
      }
    }

Here, Divcontextual_dim_1 and Divcontextual_dim_2 are initially unbound sizes. As a workaround, the backend code has some logic that tries to deduce the actual sizes given the input, but for this particular case it fails and falls back to some logic that assumes it is of size 1. Hence the {1,1,1} output size that is expected. It works for previous versions of PyTorch, because it produces fixed dimension sizes.

So, this is something we need to fix on our side. As @jobergum mentions, one workaround is to use an older PyTorch, version 1.9.1 works well in my testing. Another is to help the export by specifying the dynamic axes:

    torch.onnx.export(
        cmodel,
        args=args,
        f=out_file,
        input_names=input_names,
        output_names=output_names,
        dynamic_axes={
            "input_ids": {0: "batch", 1: "dim1"},
            "attention_mask": {0: "batch", 1:"dim2"},
            "contextual": {0: "batch", 1: "dim1", 2: "dim2"},
        },
        opset_version=11,
    )

Notice the specific binding here between dimensions in the output to the inputs.

We will try to address this soon by using ONNX Runtime to deduce the size, perhaps during this dry-run.

jobergum commented 2 years ago

@lukasschmit were you able to progress on this?

jobergum commented 2 years ago

I'm resolving this.

vespa-engine / vespa

ColBERT ONNX dimension error #21062