Hi!
I developed an LM based on your implementation of character-aware model and during debugging I found out that order of operations
Model/Embedding/embedding_lookup:0 and
Model/Embedding/ScatterUpdate:0
Here is a piece of Tf debugger output:
Iteration n:
The values of the embedding table will look different in these cases: in the second case the first row will be zeroed as you specify in your comments while in the first case the first row has non-zero values.
I tested this code on tensorflow 1.8. I suspect that the problem may be caused by parallelization algorithms inside tensorflow.
Hi! I developed an LM based on your implementation of character-aware model and during debugging I found out that order of operations Model/Embedding/embedding_lookup:0 and Model/Embedding/ScatterUpdate:0
Here is a piece of Tf debugger output: Iteration n:
[0.678] 1.99k Gather Model/Embedding/embedding_lookup:0 [0.760] 242 Const Model/Embedding/ScatterUpdate/indices:0 [0.955] 270 Const Model/Embedding/Const:0 [1.101] 202 Const Model/one_hot/depth:0 [1.165] 4.09k ScatterUpdate Model/Embedding/ScatterUpdate:0
Iteration m:
[1.538] 4.09k ScatterUpdate Model/Embedding/ScatterUpdate:0 [1.606] 1.99k Gather Model/Embedding/embedding_lookup:0
The values of the embedding table will look different in these cases: in the second case the first row will be zeroed as you specify in your comments while in the first case the first row has non-zero values. I tested this code on tensorflow 1.8. I suspect that the problem may be caused by parallelization algorithms inside tensorflow.