Closed Sudharsan-V closed 8 months ago
The end-to-end demo for the functional Bert model is completed. corresponding commit
In the Bert model, we have 2 variants.
While running the optimized version, we are facing the following issue;
E RuntimeError: TT_ASSERT @ tt_eager/tt_dnn/op_library/bmm/multi_core_reuse_mcast_2d_optimized/bmm_op_multi_core_reuse_mcast_2d_optimized.cpp:921: false
E info:
E mcast_in0 is not implemented yet.
E backtrace:
E --- tt::tt_metal::matmul_multi_core_reuse_mcast_2d_optimized(tt::tt_metal::Tensor const&, tt::tt_metal::Tensor
const&, std::optional<tt::tt_metal::Tensor const>, tt::tt_metal::Tensor&, bool, tt_xy_pair, MathFidelity, unsigned int, unsigned int, unsigned int, unsigned int, unsigned int, bool, bool, std::optional<tt::tt_metal::UnaryWithParam>)
The demo for the normal variant(batch_size = 1 and batch_size > 1) is successful.The results are as follows. Dataset used: SQuADV2 Exact match: 6.25 F1 score: 10.457664884135472 The results shared above are tested with batch_size = 8 and #loops = 10 (total 80 samples)
While running the inference, we observed that the performance of the model was not good. After decoding the output of the model, we are getting a valid English token but it doesn't match with the reference output.
Note: We are achieving pcc > 0.99 in the tests/ttnn/integration_tests/bert/test_bert.py
, Here we are feeding tensor filled with 0s as the input to the model. When we change the input to have non-zero values the pcc of the model is reduced in the range ~(0.5-0.6).
For optimized version the attention mask does not seem to be set correctly. I left a comment in the commit with changes that make the test run.
The demo for the functional bert is completed by incorporating the comments shared by Boris. Now, we can run the tests for both variants of the functional_bert with batch_size>1.
For Batch_size = 1, we are facing the following issue in the optimized version
E RuntimeError: TT_ASSERT @ tt_eager/tt_dnn/op_library/bmm/multi_core_reuse_mcast_2d_optimized/bmm_op_multi_core_reuse_mcast_2d_optimized.cpp:925: false
E info:
E mcast_in0 is not implemented yet.
E backtrace:
E --- tt::tt_metal::matmul_multi_core_reuse_mcast_2d_optimized(tt::tt_metal::Tensor const&, tt::tt_metal::Tensor const&, std::optional<tt::tt_metal::Tensor const>, tt::tt_metal::Tensor&, bool, tt_xy_pair, MathFidelity, bool, bool, unsigned int, unsigned int, unsigned int, unsigned int, unsigned int, bool, bool, std::optional<tt::tt_metal::UnaryWithParam>)
E --- /home/ubuntu/sudharsan/tt-metal/tt_eager/tt_lib/_C.so(+0x33f73d) [0x7f6ea54d373d]
E --- /home/ubuntu/sudharsan/tt-metal/tt_eager/tt_lib/_C.so(+0x33f832) [0x7f6ea54d3832]
We do not face any issue with the other variant(ttnn_functional_bert)
Corresponding PR#4582
PR merged, Closing the ticket.
GS e2e Demo
test uses batched real inputs and produces batched outputs
this test is looped (takes several batches of inputs, produces several batches of outputs)
test evaluates results of TT model against validation data in selected data set
test is documented
how to run test (commands to execute) what does the model do (high-level explanation in a couple of sentences) description of what is input and what is output this demo should load inputs and expected outputs from a data set on Weka (TBD which one)