tensorflow / profiler

A profiling and performance analysis tool for TensorFlow
Apache License 2.0
359 stars 55 forks source link

No step marker observed warning #366

Open mandicLuka opened 3 years ago

mandicLuka commented 3 years ago

I get this warning when I try to profile my training setup. No step marker observed and hence the step time is unknown. This may happen if (1) training steps are not instrumented

I implemented a custom train_step and a custom layer and network that derive from keras api.

When I use keras api to build model with keras built in layers, everything is fine, even with my custom train_step method. But when I use my custom layer instead of keras layers, I get the aforementioned warning. My training step with my custom layers takes around 30ms. To me it is strange that this profiling is layer dependent.

I even tried putting tf.profiler.experimental.Trace('train') to the train_step function but nothing changed.

Custom layer:


    @tf.function
    def fwd(self, u1, u2):
        b = self.activation(tf.matmul(u2, self.bias))
        activ = self.activation(tf.tensordot(u2, self.coef, axes=1)) + self.matrix_bias
        u1_expand = tf.expand_dims(u1, 1)

        # elementwise do matrix product between u1 and activ
        inner = tf.map_fn(lambda x: tf.matmul(x[0], x[1])[0], (u1_expand, activ), fn_output_signature="float32")
        return b + inner

    def call(self, inputs):
        u1, u2 = inputs
        v1 = u2
        v2 = self.fwd(u1, u2)
        return [v1, v2]

Custom tran_step:

            x = data[0]
            x_p = data[1]

            with tf.GradientTape() as tape:

                x_embed = self.embed(x)
                x_p_embed = self.embed(x_p)
                y = x_p_embed
                y_pred = self.U(x_embed)  # Dense layer

                loss = self.compiled_loss(y, y_pred, regularization_losses=self.losses)

            # Compute gradients
            trainable_vars = self.trainable_variables
            gradients = tape.gradient(loss, trainable_vars)

            # Update weights
            self.optimizer.apply_gradients(zip(gradients, trainable_vars))

            # Update metrics (includes the metric that tracks the loss)
            self.compiled_metrics.update_state(y, y_pred)

            # Return a dict mapping metric names to current value
            return {m.name: m.result() for m in self.metrics}
michaelyma12 commented 2 years ago

Were you able to find a solution? Running into similar issue with my own subclassed model.

mandicLuka commented 2 years ago

@michaelyma12 No solution yet, but my workaround was to copy the function code to a standalone tf function outside the custom layer and repeatedly call the function from the another script. In this case the profiler worked fine