tenstorrent / tt-metal

:metal: TT-NN operator library, and TT-Metalium low level kernel programming model.
Apache License 2.0
429 stars 57 forks source link

Demo for functional bert #4505

Closed Sudharsan-V closed 8 months ago

Sudharsan-V commented 9 months ago

GS e2e Demo

test uses batched real inputs and produces batched outputs

this test is looped (takes several batches of inputs, produces several batches of outputs)

test evaluates results of TT model against validation data in selected data set

test is documented

how to run test (commands to execute) what does the model do (high-level explanation in a couple of sentences) description of what is input and what is output this demo should load inputs and expected outputs from a data set on Weka (TBD which one)

Sudharsan-V commented 9 months ago

The end-to-end demo for the functional Bert model is completed. corresponding commit

In the Bert model, we have 2 variants.

  1. Optimized version
  2. Normal version

While running the optimized version, we are facing the following issue;

E               RuntimeError: TT_ASSERT @ tt_eager/tt_dnn/op_library/bmm/multi_core_reuse_mcast_2d_optimized/bmm_op_multi_core_reuse_mcast_2d_optimized.cpp:921: false
E               info:
E               mcast_in0 is not implemented yet.
E               backtrace:
E                --- tt::tt_metal::matmul_multi_core_reuse_mcast_2d_optimized(tt::tt_metal::Tensor const&, tt::tt_metal::Tensor 
const&, std::optional<tt::tt_metal::Tensor const>, tt::tt_metal::Tensor&, bool, tt_xy_pair, MathFidelity, unsigned int, unsigned int, unsigned int, unsigned int, unsigned int, bool, bool, std::optional<tt::tt_metal::UnaryWithParam>)

The demo for the normal variant(batch_size = 1 and batch_size > 1) is successful.The results are as follows. Dataset used: SQuADV2 Exact match: 6.25 F1 score: 10.457664884135472 The results shared above are tested with batch_size = 8 and #loops = 10 (total 80 samples)

While running the inference, we observed that the performance of the model was not good. After decoding the output of the model, we are getting a valid English token but it doesn't match with the reference output.

Note: We are achieving pcc > 0.99 in the tests/ttnn/integration_tests/bert/test_bert.py, Here we are feeding tensor filled with 0s as the input to the model. When we change the input to have non-zero values the pcc of the model is reduced in the range ~(0.5-0.6).

boris-drazic commented 9 months ago

For optimized version the attention mask does not seem to be set correctly. I left a comment in the commit with changes that make the test run.

Sudharsan-V commented 9 months ago

The demo for the functional bert is completed by incorporating the comments shared by Boris. Now, we can run the tests for both variants of the functional_bert with batch_size>1.

For Batch_size = 1, we are facing the following issue in the optimized version

E               RuntimeError: TT_ASSERT @ tt_eager/tt_dnn/op_library/bmm/multi_core_reuse_mcast_2d_optimized/bmm_op_multi_core_reuse_mcast_2d_optimized.cpp:925: false
E               info:
E               mcast_in0 is not implemented yet.
E               backtrace:
E                --- tt::tt_metal::matmul_multi_core_reuse_mcast_2d_optimized(tt::tt_metal::Tensor const&, tt::tt_metal::Tensor const&, std::optional<tt::tt_metal::Tensor const>, tt::tt_metal::Tensor&, bool, tt_xy_pair, MathFidelity, bool, bool, unsigned int, unsigned int, unsigned int, unsigned int, unsigned int, bool, bool, std::optional<tt::tt_metal::UnaryWithParam>)
E                --- /home/ubuntu/sudharsan/tt-metal/tt_eager/tt_lib/_C.so(+0x33f73d) [0x7f6ea54d373d]
E                --- /home/ubuntu/sudharsan/tt-metal/tt_eager/tt_lib/_C.so(+0x33f832) [0x7f6ea54d3832]

We do not face any issue with the other variant(ttnn_functional_bert)

Corresponding PR#4582

saichandax commented 8 months ago

PR merged, Closing the ticket.