Closed dhingratul closed 6 years ago
Have you tried running the included benchmarking code? It should run out of the box and produce all the timing results.
Also, sparse_scatter_var is only intended for inference. We didn't apply these optimizations for training. During inference you shouldn't need the gradient.
Did you create different training/inference graphs or re-use weights from the training step during inference within the same graph?
IIRC we recreated the graph during inference with minor differences (BN is_training=False, sparse_scatter_var). The weights were imported from learned weights during training of course (unless i'm misunderstanding your question).
Trying to reproduce the paper's result with the recommended sparse_scatter_var method, but I am getting an error which says the gradients are not registered
LookupError: No gradient defined for operation 'network/SparseScatterVar' (op type: SparseScatterVar)