tenstorrent / tt-metal

:metal: TT-NN operator library, and TT-Metalium low level kernel programming model.
Apache License 2.0
459 stars 68 forks source link

noc race in kernel completion #9523

Closed pgkeller closed 3 weeks ago

pgkeller commented 4 months ago

brisck.cc and ncrisck.cc (and erisc?) should issue a write_barrier to sync the noc before completing and telling dispatcher that everything is done. not doing so is a race, for example, a kernel writing to DRAM could signal completion and prefetcher could read data from DRAM before the data has landed (though that's a fairly long delay)

pgkeller commented 2 months ago

@davorchap @tt-aho is this a legit issue? should fw enforce the sync, or should we leave this to kernels?

pgkeller commented 3 weeks ago

will not fix - no need to add the delay, it is a kernel issue if syncs don't occur