I have experiment with IAN in AREkit framework for sentiment attitudes extraction (this implementation has been embedded into toolkit).
The problem was that all the weights within a context/aspects remains equal 1.
The latter leads to that only last peceptron layer changes during training process.
All the other hidden states remains the same as i think due to the abscense of a variation and hence a gradient for back prop.
Clarifying axes from 0 (by default, by i -- batch) to 1 ( j -- context words) (ij in einsum function notation) fixed the problem.
Suppose that the latter led to worse results since implementation become updated.
Before (only last layer updates)
After (all the layers update its values during training)
First two matrices are context and aspect weights respectively.
I used a pair aspects (Object/Subject), tested in sentiment attitudes extraction task, RuSentRel dataset.
I have experiment with IAN in AREkit framework for sentiment attitudes extraction (this implementation has been embedded into toolkit). The problem was that all the weights within a context/aspects remains equal 1. The latter leads to that only last peceptron layer changes during training process. All the other hidden states remains the same as i think due to the abscense of a variation and hence a gradient for back prop.
Clarifying axes from 0 (by default, by
i
-- batch) to 1 (j
-- context words) (ij
in einsum function notation) fixed the problem.Suppose that the latter led to worse results since implementation become updated.