sonos / tract

Tiny, no-nonsense, self-contained, Tensorflow and ONNX inference
Other
2.2k stars 212 forks source link

ONNX Optimization failing due to inconsistent matmul #927

Closed VariantXYZ closed 1 year ago

VariantXYZ commented 1 year ago

I've been looking into https://github.com/breizhn/DTLN-aec, converted to ONNX using the very convenient tflite2onnx python package, and have found that it seems to spit out an error when trying to optimize (the model analyzes and runs fine otherwise).

tract -O -f onnx 2.onnx dump
[2023-01-07T03:55:43.313206000Z ERROR tract] Error at stage optimize

    Caused by:
        0: running pass codegen
        1: codegen node #2 "conv1d;functional_5/conv1d_2/StatefulPartitionedCall/conv1d" ConvUnary
        2: wiring conv1d;functional_5/conv1d_2/StatefulPartitionedCall/conv1d.matmul (MatMulUnary { a: 1,512,512,F32 0.45948926, 0.5314799, 0.4752084, 0.34898752, 0.5644988, 0.31744096, 0.23310894, -0.022827938, -0.074196175, 0.29069299, 0.07359066, -0.14539745..., axes: MatMulAxes { a_m: 1, a_k: 2, b_k: 2, b_n: 1, c_m: 2, c_n: 1 } }), determining output_facts
        3: in output_facts invocation
        4: Inconsistent matmul: a: 1,512,512 b: 1,512,1, axes: am:1 ak:2 bk:2 bn:1 cm:2 cn:1

You can convert it yourself, but I've attached the model in a zip file here.

2.zip

The node in question does not seem particularly egregious:

┏ 0 Source input_6
┃   ━━━ 1,1,512,F32
┣ 1 RmAxis conv1d;functional_5/conv1d_2/StatefulPartitionedCall/conv1d__45.0
┃   ━━━ 1,512,F32
┣ 2 ConvUnary conv1d;functional_5/conv1d_2/StatefulPartitionedCall/conv1d
┣ 3 AddAxis conv1d/Squeeze;functional_5/conv1d_2/StatefulPartitionedCall/conv1d/Squeeze.0
┃   ━━━ 1,1,512,F32
┣┓
┃┣ 4 Reduce<Sum> Mean;functional_5/instant_layer_normalization_6/StatefulPartitionedCall/Mean.sum
┃┃   ━━━ 1,1,1,F32

(512 512 512x1, it seems like the outer '1' is incorrectly being tacked on) but I may be missing something obvious.

Also, I wasn't sure if I had made a mistake, but when I tried to dump these as NNEF and optimize that way, I instead ran into a strange parsing error. Let me know if this is a separate bug and I'll go ahead and file it as well.

tract -f onnx --nnef-tract-core 2.onnx dump --nnef-tar 2.nnef.tar
tract -f nnef --nnef-tract-core 2.nnef.tar dump
[2023-01-07T04:18:03.017771200Z ERROR tract] Error while loading resource by "GraphNnefLoader" at path "graph.nnef"

    Caused by:
        Fail to parse NNEF document: Error(Error
(SNIP)
kali commented 1 year ago

Main fix is merged. Convolution is confused when all its geometric axes are gone, so we'll refrain from removing the last one (it will be re-expressed as a matmul anyway).

The nnef syntax one is on its way.

VariantXYZ commented 1 year ago

@kali , do you want me to make a separate issue for NNEF loading?

kali commented 1 year ago

I think I fixed it, that was the second PR. Except it's not a load problem, you have to dump the network again, I have extended the syntax to support arbitrary strings as ids.

VariantXYZ commented 1 year ago

Ah, you're right. It works after updating.

Thanks!