Epoch summaries on tensorboard

sebpuetz commented 4 years ago

Add epoch-averaged (train and dev) values of the summarized metrics to tensorboard. After a few epochs it's hard to tell anything from the per-batch graphs.

twuebi commented 4 years ago

I'd like to keep the per-batch summaries and add the per-epoch as additional summaries.

Something along these lines:

 epoch_loss_placeholder = tf.placeholder(name="epoch_loss_placeholder",dtype=tf.float32,shape=[])
 epoch_acc_placeholder =  tf.placeholder(name="epoch_acc_placeholder",dtype=tf.float32,shape=[])
 val_epoch = tf.Variable(
     0,
     trainable=False,
     dtype=tf.int64,
     name="val_epoch")
     val_epoch = tf.convert_to_tensor(val_epoch)
 with tf.control_dependencies([val_epoch.assign_add(1)]):
     epoch_val_summaries = [
         tf.contrib.summary.scalar(name="epoch_loss",
                                   tensor=epoch_loss_placeholder,
                                   step=val_epoch,
                                   family="val"),
         tf.contrib.summary.scalar(name="epoch_acc",
                                   tensor=epoch_acc_placeholder,
                                   step=val_step,
                                   family="val")]

 ...

 with tf.compat.v1.variable_scope("summaries"):
     self.train_summaries = tf.group(train_summaries, name="train")
     self.val_summaries = tf.group(val_summaries, name="val")
     self.epoch_val_summaries = tf.group(epoch_val_summaries, name="epoch_val")

IMO the per-batch graphs can be informative to find out what's going on when things don't work. For instance, by looking at the per batch gradient norms or spikes in the loss / accuracy.

danieldk commented 4 years ago

At this point graph compatibility is everything. I guess the ramification here would be that we need three additional Optional ops in TaggerGraph, right?

twuebi commented 4 years ago

If graph compatibility means to load an old model with a newly written graph, then we need 4 optional ops since the variable val_epoch will be missing when calling the restore op of a new graph. So it'd also need to be a placeholder.

If it only means being able to load both graphs on the rust side, then it's three options.

All in all, it may be a good idea to rewrite the graph after training for inference where a stable interface is needed such that all of these compatibility problems would only apply to training graphs.

stickeritis / sticker

Epoch summaries on tensorboard #144