Op names aren't shown in tensorflow_stats

Hi, I'm trying to profile the inference performance of my model on CPU to figure out where the bottlenecks are. However, when I run the profiler and open tensorflow_stats, it groups all the operations by type (MatMul, BiasAdd...) and doesn't tell me which layer is taking how much time.

Here is a minimal example:

import tensorflow as tf

class FakeModel(tf.keras.Model):
    def __init__(self):
        super().__init__()
        self.l1 = tf.keras.layers.Dense(16, activation=tf.nn.relu, name="bar")
        self.l2 = tf.keras.layers.Dense(8, name="baz")

    def call(self, x, *args, **kwargs):
        y = self.l1(x)
        y = self.l2(y)

        return y

fake_model = FakeModel()

with tf.profiler.experimental.Profile('logdir'):
    for i in range(10):
        fake_batch = tf.random.normal((8, 32))
        with tf.profiler.experimental.Trace("test", step_num=i):
            fake_model(fake_batch)

The console output doesn't report anything weird:

2022-03-24 17:54:20.546354: I tensorflow/core/platform/cpu_feature_guard.cc:151] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2022-03-24 17:54:20.557142: I tensorflow/core/profiler/lib/profiler_session.cc:110] Profiler session initializing.
2022-03-24 17:54:20.557166: I tensorflow/core/profiler/lib/profiler_session.cc:125] Profiler session started.
2022-03-24 17:54:20.605772: I tensorflow/core/profiler/lib/profiler_session.cc:67] Profiler session collecting data.
2022-03-24 17:54:20.609004: I tensorflow/core/profiler/lib/profiler_session.cc:143] Profiler session tear down.
2022-03-24 17:54:20.616532: I tensorflow/core/profiler/rpc/client/save_profile.cc:136] Creating directory: logdir/plugins/profile/2022_03_24_17_54_20

2022-03-24 17:54:20.619339: I tensorflow/core/profiler/rpc/client/save_profile.cc:142] Dumped gzipped tool data for trace.json.gz to logdir/plugins/profile/2022_03_24_17_54_20/my-computer.trace.json.gz
2022-03-24 17:54:20.625353: I tensorflow/core/profiler/rpc/client/save_profile.cc:136] Creating directory: logdir/plugins/profile/2022_03_24_17_54_20

2022-03-24 17:54:20.625652: I tensorflow/core/profiler/rpc/client/save_profile.cc:142] Dumped gzipped tool data for memory_profile.json.gz to logdir/plugins/profile/2022_03_24_17_54_20/my-computer.memory_profile.json.gz
2022-03-24 17:54:20.628578: I tensorflow/core/profiler/rpc/client/capture_profile.cc:251] Creating directory: logdir/plugins/profile/2022_03_24_17_54_20
Dumped tool data for xplane.pb to logdir/plugins/profile/2022_03_24_17_54_20/my-computer.xplane.pb
Dumped tool data for overview_page.pb to logdir/plugins/profile/2022_03_24_17_54_20/my-computer.overview_page.pb
Dumped tool data for input_pipeline.pb to logdir/plugins/profile/2022_03_24_17_54_20/my-computer.input_pipeline.pb
Dumped tool data for tensorflow_stats.pb to logdir/plugins/profile/2022_03_24_17_54_20/my-computer.tensorflow_stats.pb
Dumped tool data for kernel_stats.pb to logdir/plugins/profile/2022_03_24_17_54_20/my-computer.kernel_stats.pb

When I view the results in Tensorboard,, I get:

Here I would like the two layers to be shown as separate elements in the charts, rather than there being a single "MatMul" element for both matrix multiplications together. This is also what the guide shows in one of the screenshots. How can I achieve this?

Furthermore, when I run this code, I can only see tensorflow_stats and trace_viewer; some of the others say "No step marker observed and hence the step time is unknown". Am I using the profiler wrong? My model uses a custom training loop so I am using context managers rather than a callback, as per the guide.

Versions: tensorflow 2.8.0 tensorboard 2.8.0 tensorboard_plugin_profile 2.5.0 macOS Catalina 10.15.7

tensorflow / profiler

Op names aren't shown in tensorflow_stats #401