tenstorrent / tt-metal

:metal: TT-NN operator library, and TT-Metalium low level kernel programming model.
Apache License 2.0
408 stars 51 forks source link

User Experience Issue: DumpDeviceProfileResult Usage Not Clear #10155

Open SeanNijjar opened 2 months ago

SeanNijjar commented 2 months ago

Calling DumpDeviceProfileResult outside of detail invokes some special behaviour (I think) that causes the call to hang in typical usage.

To get a non-hanging version with expected behaviour, we have to call detail::DumpDeviceProfileResult instead. Looking at my own microbenchmarks, I see that I had to do this as well.

This is a confusing user experience issue and should be fixed. A user shouldn't hit this issue and then have to dig into detail, ever, to get the desired behaviour.

See the Discord thread below: https://discord.com/channels/863154240319258674/1260222829597687880

@mo-tenstorrent - I'm tentatively assigning to you. Please redirect as necessary.

All customer bugs are supposed to be tagged P0. However, the user was able to work around the issue and is not blocked so it's no longer blocking the user. The customer's immediate issue was resolved (call the function that's in the detail namespace) but this will be hit by future customers so I think it should be cleaned up sooner rather than later.

mo-tenstorrent commented 2 months ago

As per the conversation on the above mentioned discord thread, the issue looks like a timing issue.

@SeanNijjar should we keep this ticket as a profiler user experience issue and start another one for the customer issue?

For a better user experience, I would retire the (device, program) foot print and make (device) the external facing one.

The program footprint was from times that our post processing of the data was not as mature and that way we could limit the generated data a little.

SeanNijjar commented 2 months ago

Yea I agree. Based on the discord disucssion, it seemed likely there was a user error leading to the issue. We should make the API change but its a usability improvement and no longer a user bug.