zhiguowang / BiMPM

BiMPM: Bilateral Multi-Perspective Matching for Natural Language Sentences
Apache License 2.0
438 stars 150 forks source link

Training seems slow.... wondering if caused by "Converting sparse IndexedSlices to a dense Tensor" msg #5

Open davidsvaughn opened 7 years ago

davidsvaughn commented 7 years ago

Training works, but it seems very slow... which is fine, as long as this is the expected behavior. I'm just curious if it is unusually slow for me.... did you happen to get this warning?

/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/gradients_impl.py:93: UserWarning: Converting sparse IndexedSlices to a dense Tensor of unknown shape. This may consume a large amount of memory.
  "Converting sparse IndexedSlices to a dense Tensor of unknown shape. "

If not, then I'm wondering if my code is running slow due to this message... So far, I know it's coming from the tf.gradients() function call in line 220 of SentenceMatchModelGraph.py:

grads, _ = tf.clip_by_global_norm(tf.gradients(self.loss, tvars), clipper)

Unfortunately, this doesn't help much because I still don't know which part of the network is triggering it...
If you tell me that your are not getting this message, then I will investigate further and try to find the root cause. Thanks!

ijinmao commented 7 years ago

I implement a simplified keras version here. The training is also very slow. I found the slowest strategy is Maxpooling-Matching where "each forward (or backward) contextual embedding is compared with every forward (or backward) contextual embeddings of the other sentence". You could check it out by commenting each strategy in multi_perspective.py.

In my keras version, the warning is caused by tf.gather(). But I don't think this is the reason of slowness.

fuhuamosi commented 7 years ago

@davidsvaughn Hi, I notice that you said the training is very slow. Can you tell me the general training time of one epoch? Thank you.

WangGangUCAS commented 5 years ago

how to solve the problem?