Open nsmithtt opened 3 months ago
Hi everyone, some of us had an offline discussion about this feature and I think we were all on the same page, I tried to capture what we discussed in this issue, so please take a look at the proposal and raise any comments / concerns. We can schedule a zoom at some point to discuss further if need be.
Agreed, and these are all on the roadmap
This is likely ~Q4'24
Motivation
For embedded or potentially even production environments, it might be infeasible or undesirable to run TTNN runtime in full. In order to enable running workloads under as many environments as possible I think we should strive for running them with:
Proposal
One possible solution that could achieve all of the above listed goals is to support metal trace serialization. This would enable users to record their TTNN or metal workloads (or potentially generate them from another tool) and collect these traces and serialize them to disk to be reloaded and rerun at some future point in time.
The rest of this document will outline some steps that could incrementally allow us to experiment with this, focusing on the minimum amount of changes required to enable this path. In the future we can adapt APIs and tools built around this flow to make it more robust.
Serialize and reload the trace
At the very minimum we need some APIs to collect and reload the trace:
This would enable some user to manage the trace data, but one critical use case is to ensure that this blob can be written to disk and loaded by a future process, potentially even on another machine. Issues like dealing with endianness are on the user to figure out.
Future Goals