RAFT_vaiq_int8 model support

nod-ai / SHARK-TestSuite

Temporary home of a test suite we are evaluating

Apache License 2.0

2 stars 29 forks source link

RAFT_vaiq_int8 model support #182

Open AmosLewis opened 5 months ago

AmosLewis commented 5 months ago

Failed op:

https://github.com/nod-ai/SHARK-Turbine/issues/636

Before fixing expand op, we cannot get any other fail op signature information. The following cmd is blocked by expand.

torch-mlir-opt -convert-torch-onnx-to-torch ./RAFT_vaiq_int8.default.torch-onnx.mlir 
./RAFT_vaiq_int8.default.torch-onnx.mlir:2045:13: error: failed to legalize operation 'torch.operator' that was explicitly marked illegal
    %2041 = torch.operator "onnx.Expand"(%2002, %2040) : (!torch.vtensor<[1,2,?,?],f32>, !torch.vtensor<[?],si64>) -> !torch.vtensor<[],f32> 
            ^
./RAFT_vaiq_int8.default.torch-onnx.mlir:2045:13: note: see current operation: %7506 = "torch.operator"(%7142, %7505) <{name = "onnx.Expand"}> : (!torch.vtensor<[1,2,?,?],f32>, !torch.vtensor<[?],si64>) -> !torch.vtensor<[],f32>

zjgarvey commented 5 months ago

Next failures are

onnx.Resize (which should be fixed by torch-mlir PR# 3013)
A matrix multiplication that is getting half-quantized. Looking into this now, but it's going to be difficult to fix, since the second operand is getting dequantized before several shape manipulations occur.

AmosLewis commented 4 months ago

After Integrate torch-mlir@ec6d7aa onnx.resize op https://github.com/iree-org/iree/pull/17358: @zjgarvey looks like quantize related error, it would be better you look at this.

failed to translate executables
failed to translate executables
failed to translate executables
RAFT_vaiq_int8.default.onnx.torch.mlir:1644:13: error: 'func.func' op exceeded stack allocation limit of 32768 bytes for function. Got 401408 bytes
    %1601 = torch.aten.quantize_per_tensor %1596, %float5.000000e-01, %int0, %int12 : !torch.vtensor<[1024,7,7,1],f32>, !torch.float, !torch.int, !torch.int -> !torch.vtensor<[1024,7,7,1],!torch.qint8>
            ^
RAFT_vaiq_int8.default.onnx.torch.mlir:1644:13: note: called from
    %1601 = torch.aten.quantize_per_tensor %1596, %float5.000000e-01, %int0, %int12 : !torch.vtensor<[1024,7,7,1],f32>, !torch.float, !torch.int, !torch.int -> !torch.vtensor<[1024,7,7,1],!torch.qint8>
            ^

zjgarvey commented 4 months ago

Maybe we can use --iree-llvmcpu-fail-on-out-of-bounds-stack-allocation=false. but I'm not sure if that is the best option. I've been sitting on iree-compile for a while now.

zjgarvey commented 4 months ago

Related issue in iree: https://github.com/iree-org/iree/issues/17455.

AmosLewis commented 3 months ago

@IanWood1 PR to fix: https://github.com/iree-org/iree/pull/17574