tenstorrent / tt-metal

:metal: TT-NN operator library, and TT-Metalium low level kernel programming model.
Apache License 2.0
420 stars 54 forks source link

EDM Low Level Optimizations #6300

Closed SeanNijjar closed 4 weeks ago

SeanNijjar commented 6 months ago
SeanNijjar commented 3 months ago

Doing some lower level profiling of the EDM and I'm seeing this. We're really hurt by the ETH CMD Q not being ready to accept more payloads. We're basically losing 1us in the snapshot here where the message could be queued up and erisc would be able to handle additional work. Enabling the second eth cmd q may help with this.

Image

SeanNijjar commented 3 months ago

Bumping up priority based on above observations

SeanNijjar commented 3 months ago

Another observation is I'm seeing is the loop overhead (of checking if other channels made progress and need further worker) is way too high here. I'm seeing 800ns spent turning around an ack for a single read request. End to end this is on the order of 1us for a worker to get a signal from EDM of payload available to when the ack signal is received by that EDM.

(update: the first 100ns is wrong. It's actually closer to 150-175ns.). The first time in the diagram indicates that we could save 150ns (plus the time for the read request to travel from worker to edm) in latency for the EDM -> worker data path.

Image

SeanNijjar commented 4 weeks ago

Closing this issue as it's not all relevant anymore