Outputs while training - Githubissues

Testbild commented 2 years ago

Hello,

I have a question regarding the outputs while training.

I think, that here:

if __name__ == '__main__':

    trainer = TrainRetinaNet(cfgs)
    print("TRAINER MAIN")
    trainer.main()

the trainer.main() is called perpetually until training ends.

And that after each amount of steps:

************************************************************************
2021-12-20 19:13:05: global_step:20  current_step:20
speed: 4.591s, remaining training time: 05:07:29:27
total_losses:1.578
cls_loss:1.138
reg_loss:0.440

************************************************************************
2021-12-20 19:13:19: global_step:40  current_step:40
speed: 0.462s, remaining training time: 00:12:49:07
total_losses:1.588
cls_loss:1.136
reg_loss:0.451

************************************************************************
2021-12-20 19:13:28: global_step:60  current_step:60
speed: 0.512s, remaining training time: 00:14:13:17
total_losses:1.530
cls_loss:1.134
reg_loss:0.395

the self.log_printer(retinanet, optimizer, global_step, tower_grads, total_loss_dict, num_gpu, graph) is called to output the information above.

But when I place a print just before that line:

 print("BEFORE LOG_PRINTER")
 self.log_printer(retinanet, optimizer, global_step, tower_grads, total_loss_dict, num_gpu, graph)

in the:

class TrainRetinaNet(Train):

    def get_gtboxes_and_label(self, gtboxes_and_label_h, gtboxes_and_label_r, num_objects):
        return gtboxes_and_label_h[:int(num_objects), :].astype(np.float32), \
               gtboxes_and_label_r[:int(num_objects), :].astype(np.float32)

    def main(self):
        with tf.Graph().as_default() as graph, tf.device('/cpu:0'):
...

then it is only printed once.

The main reason I am doing this is, because I want to understand better, what are the predictions and inputs in the focal_loss function:


    def focal_loss(self, labels, pred, anchor_state, alpha=0.25, gamma=2.0):

        # filter out "ignore" anchors
        indices = tf.reshape(tf.where(tf.not_equal(anchor_state, -1)), [-1, ])
        labels = tf.gather(labels, indices)
        pred = tf.gather(pred, indices)

        tfconfig = tf.ConfigProto(
            allow_soft_placement=True, log_device_placement=False)
        tfconfig.gpu_options.allow_growth = True
        with tf.Session(config=tfconfig) as sess:
            #sess.run(init_op)
            print("PREDS")
            print(tf.get_static_value(pred))
            print("LABELS")
            print(tf.get_static_value(labels))

        # compute the focal loss
        per_entry_cross_ent = (tf.nn.sigmoid_cross_entropy_with_logits(
            labels=labels, logits=pred))
        prediction_probabilities = tf.sigmoid(pred)
        p_t = ((labels * prediction_probabilities) +
               ((1 - labels) * (1 - prediction_probabilities)))
        modulating_factor = 1.0
        if gamma:
            modulating_factor = tf.pow(1.0 - p_t, gamma)
        alpha_weight_factor = 1.0
        if alpha is not None:
            alpha_weight_factor = (labels * alpha +
                                   (1 - labels) * (1 - alpha))
        focal_cross_entropy_loss = (modulating_factor * alpha_weight_factor *
                                    per_entry_cross_ent)

        # compute the normalizer: the number of positive anchors
        normalizer = tf.stop_gradient(tf.where(tf.equal(anchor_state, 1)))
        # normalizer = tf.stop_gradient(tf.where(tf.greater_equal(anchor_state, 0)))
        normalizer = tf.cast(tf.shape(normalizer)[0], tf.float32)
        normalizer = tf.maximum(1.0, normalizer)

        # normalizer = tf.stop_gradient(tf.cast(tf.equal(anchor_state, 1), tf.float32))
        # normalizer = tf.maximum(tf.reduce_sum(normalizer), 1)

        return tf.reduce_sum(focal_cross_entropy_loss) / normalizer

So I wanted to have them printed out, each time the focal loss is calculated, but somehow I am not able to get this perpetual print out during training. Am I missing something maybe and is the loss function only called after a certain amount of steps, or am I doing something wrong?

Your help is highly appreciated and thank you very much for always answering the questions posted here in the issues.

Best regards!

yangxue0827 commented 2 years ago

tf.Print() function might help you

Testbild commented 2 years ago

Thank you for your answer. I tried tf.Print() but no luck with this either.

Could you maybe let me know where the loss function is called after each propagation? I think I am missing where this is happening.

yangxue0827 / RotationDetection

Outputs while training #72