Issues in examples/bert.py

I'm trying to run examples/bert.py using the provided docker container image, and I face several following issues.

1. ONNX check_model fails:

I add this code at the end of examples/bert.py to export the ONNX model:

onnx_model = ts.export_onnx(new_graph)
onnx.checker.check_model(onnx_model)
onnx.save(onnx_model, "bert.onnx")

and run python bert.py, which returns this error message:

Traceback (most recent call last):
  File "bert.py", line 47, in <module>
    onnx.checker.check_model(onnx_model)
  File "/opt/conda/lib/python3.7/site-packages/onnx/checker.py", line 91, in check_model
    C.check_model(model.SerializeToString())
onnx.onnx_cpp2py_export.checker.ValidationError: No Op registered for Matmul with domain_version of 11

==> Context: Bad node spec: input: "Relu101_fwd0" input: "Matmul119_weight" output: "Matmul119_fwd0" name: "Matmul119" op_type: "Matmul"

Is there any problem here?

2. Changing sequence length:

I want to use sequence length 128 instead of 64. I found that just seq_length = 128 will not actually change the sequence length (it returns a related error message), so I fixed the code as follows:

import taso as ts
import onnx

seq_length = 128
hidden_dims = 1024

def attention(graph, input, heads):
    d_model = input.dim(1)
    d_k = d_model // heads
    assert input.dim(1) % heads == 0
    weights = list()
    for i in range(3):
        weights.append(graph.new_weight(dims=(d_model, d_model)))
    # compute query, key, value tensors
    q = graph.matmul(input, weights[0])
    k = graph.matmul(input, weights[1])
    v = graph.matmul(input, weights[2])
    # reshape query, key, value to multiple heads
    q = graph.reshape(q, shape=(seq_length,16,64))
    k = graph.reshape(k, shape=(seq_length,16,64))
    v = graph.reshape(v, shape=(seq_length,16,64))
    # transpose query, key, value for batched matmul
    q = graph.transpose(q, perm=(1,0,2), shuffle=True)
    k = graph.transpose(k, perm=(1,0,2), shuffle=True)
    v = graph.transpose(v, perm=(1,0,2), shuffle=True)
    # perform matrix multiplications
    logits = graph.matmul(q, k)
    output = graph.matmul(logits, v)
    # transpose the output back
    output = graph.transpose(output,perm=(1,0,2), shuffle=True)
    output = graph.reshape(output, shape=(seq_length, 1024))

    # a final linear layer
    linear = graph.new_weight(dims=(d_model, d_model))
    output = graph.matmul(input, linear)
    return output

However it still returns an error:

python: /usr/TASO/src/core/matmul.cc:40: taso::Tensor* taso::Graph::matmul(taso::TensorHandle, taso::TensorHandle, taso::ActiMode): Assertion `op != Op::INVALID_OP' failed.
Aborted (core dumped)

So how should I change the sequence length?

3. Wrong model structure:

I found that the transformer block described in example/bert.py (attention() function in the code) only has a single following matmul operation after the attention, but I believe there should be three matmul actually. We can check it from this example code. Do we need to fix this example code?

Thanks for reading this, I'm looking forward any answers!

jiazhihao / TASO