mandarjoshi90 / coref

BERT for Coreference Resolution
Apache License 2.0
441 stars 92 forks source link

Is there a bug in batch_gather? #48

Open fairy-of-9 opened 4 years ago

fairy-of-9 commented 4 years ago

in independent.py

top_fast_antecedent_scores = util.batch_gather(fast_antecedent_scores, top_antecedents) # [k, c] sometimes return [NaN, NaN ...]

I tried to print the value of tensors using tf.Print()

batch_gather in util.py

def batch_gather(txt, emb, indices):
  batch_size = shape(emb, 0)
  seqlen = shape(emb, 1)
  if len(emb.get_shape()) > 2:
    emb_size = shape(emb, 2)
  else:
    emb_size = 1

  flattened_emb = tf.reshape(emb, [batch_size * seqlen, emb_size])  # [batch_size * seqlen, emb]
  offset = tf.expand_dims(tf.range(batch_size) * seqlen, 1)  # [batch_size, 1]
  gathered = tf.gather(flattened_emb, indices + offset) # [batch_size, num_indices, emb]
  gathered = tf.Print(gathered, [gathered], message='gathered')
  if len(emb.get_shape()) == 2:
    gathered = tf.squeeze(gathered, 2) # [batch_size, num_indices]
    gathered = tf.Print(gathered, [gathered], message=txt+'gathered2')
  return gathered

The results are as follows.

emb == [[-inf -inf -inf...]...]
flattened_emb == [[-inf][-inf][-inf]...]
indice + offset  == [[808 809 810...]...]
gathered == [[[nan][nan][nan]]...]
gathered2 == [[nan nan nan...]...]

Sometimes it doesn't happen(well done), but it happens frequently(training stops because Loss is NaN).

Can you help me..?

fairy-of-9 commented 4 years ago

more error case:

emb[[-inf -inf -inf...]...]
emb_shape:[480 480]
offset[[0][480][960]...]
offset_shape:[480 1]
indices[[316 0 1...]...]
indices_shape:[480 50]
indices+offset:[[316 0 1...]...]
indices+offset_shape:[480 50]
flattened_emb[[-inf][-inf][-inf]...]
flattened_emb_shape:[230400 1]
gathered[[[nan][-inf][-inf]]...]
gathered_shape:[480 50 1]
mandarjoshi90 commented 4 years ago

IIRC, I don't think I changed that part of the code from the original e2e-coref. Are you seeing this on English OntoNotes? Possibly related to https://github.com/mandarjoshi90/coref/issues/6 ?