[On Hold] Tests with input dataset

boris-drazic commented 6 months ago

Make tests for models that will use appropriate validation data set (e.g., SQUAD, hellaswag) to run model and compare output to expected outputs from data set. This tests should be organized in a similar way to models/experimental/mistral/tests/test_perf_accuracy_mistral.py. Here is list of models to make tests for

Priority:

[x] bloom [Punith] PR #4772 [Merged]
[x] t5 [Vignesh] PR #4500 [Update needed]
[x] whisper [Punith] PR #4501 [Pending CI]

Rest:

[x] yolov5 [Keerthana] PR #4776 [Pending CI]
[x] swin [Jayasurya] PR #4774 [Merged]
[ ] llama [Jayasurya] [To create PR]
[x] trocr [Jayasurya] commit [To verify on main]
[x] ssd [Keerthana] PR #4585 [Merged]
[x] yolov3 [Keerthana] PR #4719 [Merged]
[x] roberta [Keerthana] PR #4627 [Merged]
[x] deit [Keerthana] PR #4628 [Merged]
[x] efficientnet [Jayasurya] PR #4633 [Merged]
[x] vit [Sudharsan] PR #4749 [Merged]
[x] distilbert [Keerthana] PR #4510 [Merged]
[x] mnist [Jayasurya] PR #4502 [Merged]
[x] lenet [Jayasurya] PR #4634 [Merged]

Don't do:

[ ] llama2 (no TT implementation)

Sudharsan-V commented 6 months ago

Please find the independent tickets/issues for the models below: bloom - #4374 t5 - #4407 whisper - #4427 distilbert - #4482 mnist - #4431 ssd - #4506 tr-ocr - #4486 Roberta - #4556 Swin - #4557 Diet - #4609 EfficientNet - #4610 ViT - #4611 LeNet #4623 Yolov3 #4622 Yolov5 #4668 llama #4717

saichandax commented 5 months ago

Currently, only 5 models are pending at different levels.

[ ] t5 [Vignesh] PR #4500 [Update needed]
[x] whisper [Punith] PR #4501 [Pending CI]
[x] yolov5 [Keerthana] PR #4776 [Pending CI]
[ ] llama [Jayasurya] [To create PR]
[x] trocr [Jayasurya] commit [To verify on main]

The work will be resumed once Stable Diffusion tasks are done #4765

tenstorrent / tt-metal

[On Hold] Tests with input dataset #4351