First off, amazing work - I really enjoyed your paper!
I have a question regarding profile loss normalization by sequence length. In the implementation of multinomial_nll in losses.py, the sum-reduced profile loss is normalized by seqlen, however, seqlen is defined as seqlen = tf.to_float(tf.shape(true_counts)[0]). Wouldn't this normalize the loss by the batch size, since the shape of true_counts is (batch, seqlen, channels)?
Dear authors,
First off, amazing work - I really enjoyed your paper!
I have a question regarding profile loss normalization by sequence length. In the implementation of
multinomial_nll
inlosses.py
, the sum-reduced profile loss is normalized byseqlen
, however,seqlen
is defined asseqlen = tf.to_float(tf.shape(true_counts)[0])
. Wouldn't this normalize the loss by the batch size, since the shape oftrue_counts
is(batch, seqlen, channels)
?Thanks in advance for clarifying!