tenstorrent / tt-metal

:metal: TT-NN operator library, and TT-Metalium low level kernel programming model.
Apache License 2.0
475 stars 75 forks source link

Implement Distilbert Data Parallel on n300 #13396

Open saichandax opened 1 month ago

saichandax commented 1 month ago
Sudharsan-V commented 1 month ago

The data parallel implementation for distilbert is completed. Corresponding PR #13158

Sudharsan-V commented 1 month ago

The pipeline for the distilbert data parallel is enabled, but the test fails when devices are initialized using fixtures from the conftest files. Previously, using the mesh_device fixture caused the test to hang on the n150 machine, while the device fixture caused it to hang on n300. Currently, when both fixtures are used, the test runs twice: one passes, and the other fails while closing the device(here). We are actively debugging this issue. The next step is to use the all_device fixture and verify the model.

Corresponding PR https://github.com/tenstorrent/tt-metal/pull/13158

cc: @boris-drazic @yieldthought

Sudharsan-V commented 1 month ago

the pytest fixture for multi_device has been added to the conftest.py file due to compatibility issues when using device and mesh_device on the n300 and n150 devices. Corresponding PR https://github.com/tenstorrent/tt-metal/pull/13158