lucidrains / x-transformers

A concise but complete full-attention transformer with a set of promising experimental features from various papers
MIT License
4.63k stars 395 forks source link

Replace .triu calls to allow ONNX export for CPU runtime #161

Closed jorgetavares closed 1 year ago

jorgetavares commented 1 year ago

Hi,

When exporting a model to ONNX and then using the ONNX RT CPU provider for optimizations, ORT complains about not finding an implementation for the triu op:

NotImplemented: [ONNXRuntimeError] : 9 : NOT_IMPLEMENTED : Could not find an implementation for Trilu(14) node with name '/attn_layers/layers.0.1/attend/Trilu'

To reproduce the error, you just need to create a model from the examples (with the corresponding input tensor), export it to ONNX using torch.export and then pass the exported model to ORT:

import onnx
import onnxruntime as rt
import onnxoptimizer

sess_options = rt.SessionOptions()
sess_options.graph_optimization_level = (
    rt.GraphOptimizationLevel.ORT_ENABLE_EXTENDED
)

sess_options.optimized_model_filepath = str("model_opt.onnx")
providers = ["CPUExecutionProvider"]
rt.InferenceSession("model.onnx", sess_options, providers=providers)

To fix it, I removed the triu calls and replaced with a similar code that you already used before (see commit https://github.com/lucidrains/x-transformers/commit/6119be8d815b9ee2eb01c08fdaa49190b83ab03d). This way we are able to export to ONNX and run the model on CPU.

Thanks,

lucidrains commented 1 year ago

@jorgetavares hey Jorge! so i'm googling around and it seems like Onnx has resolved the triu operator in version 14? can you confirm (or refute) that?

jorgetavares commented 1 year ago

Hi, thanks for replying so fast! I'm using the latest versions of all the onnx packages: onnx 1.14.0 and onnxruntime 1.15.1. I re-run and I still see the same error. However, this only happens for the CPU execution provider, for CUDA it's all fine. I also tried using the latest opset in torch.onnx.export but no luck either.

Perhaps that's what you saw mentioned, i.e., triu operator being solved but only for CUDA? Otherwise, can you point me to what you read because I might be missing something simple then.

I understand people are mostly interested in GPU runtime but CPU is still useful for some things :)

To be fair, this is really an ONNX runtime issue...

lucidrains commented 1 year ago

oh yes, you are right

let me think about this

jorgetavares commented 1 year ago

Ok, thanks for looking into this!

lucidrains commented 1 year ago

@jorgetavares want to try 1.16.17? with Decoder(..., attn_onnxable = True) ?

jorgetavares commented 1 year ago

Hi,

I just tested 1.16.17 and it works now! Thanks a lot for providing a fix! :-)

jorgetavares commented 1 year ago

I think we can close this.