tenstorrent / tt-metal

:metal: TT-NN operator library, and TT-Metalium low level kernel programming model.
Apache License 2.0
454 stars 67 forks source link

Serializable Traces #10702

Open nsmithtt opened 3 months ago

nsmithtt commented 3 months ago

Motivation

For embedded or potentially even production environments, it might be infeasible or undesirable to run TTNN runtime in full. In order to enable running workloads under as many environments as possible I think we should strive for running them with:

Proposal

One possible solution that could achieve all of the above listed goals is to support metal trace serialization. This would enable users to record their TTNN or metal workloads (or potentially generate them from another tool) and collect these traces and serialize them to disk to be reloaded and rerun at some future point in time.

The rest of this document will outline some steps that could incrementally allow us to experiment with this, focusing on the minimum amount of changes required to enable this path. In the future we can adapt APIs and tools built around this flow to make it more robust.

Serialize and reload the trace

At the very minimum we need some APIs to collect and reload the trace:

/**
 * Collect a trace for later use.
 *
 * Return value: A pointer to the trace data and size in bytes
 *
 * | Argument     | Description                                                            | Type                          | Valid Range                        | Required |
 * |--------------|------------------------------------------------------------------------|-------------------------------|------------------------------------|----------|
 * | device       | The device holding the trace.                                          | Device *                      |                                    | Yes      |
 * | trace_id     | A unique id representing an existing captured trace.                   | uint32_t                      |                                    | Yes      |
 */
std::pair<std::uint8_t const*, std::size_t> CollectTrace(Device *device, const uint32_t tid);

/**
 * Load an external trace into an associated command queue
 *
 * Return value: Trace ID
 *
 * | Argument        | Description                                                            | Type                          | Valid Range                        | Required |
 * |-----------------|------------------------------------------------------------------------|-------------------------------|------------------------------------|----------|
 * | device          | The device holding being traced.                                       | Device *                      |                                    | Yes      |
 * | cq_id           | The command queue id associated with the trace.                        | uint8_t                       |                                    | Yes      |
 * | trace_data_ptr  | A pointer to the trace data.                                           | const uint8_t*                |                                    | Yes      |
 * | trace_data_size | The size in bytes of the trace data.                                   | size_t                        |                                    | Yes      |
*/
uint32_t LoadTrace(Device *device, const uint8_t cq_id, std::uint8_t const* trace_data_ptr, std::size_t trace_data_size);

This would enable some user to manage the trace data, but one critical use case is to ensure that this blob can be written to disk and loaded by a future process, potentially even on another machine. Issues like dealing with endianness are on the user to figure out.

Future Goals

nsmithtt commented 3 months ago

Hi everyone, some of us had an offline discussion about this feature and I think we were all on the same page, I tried to capture what we discussed in this issue, so please take a look at the proposal and raise any comments / concerns. We can schedule a zoom at some point to discuss further if need be.

davorchap commented 2 months ago

Agreed, and these are all on the roadmap

This is likely ~Q4'24