I’m trying to use Routing by agreement with TRANSFORMER-BASED for NMT task. The proposed idea is to use each output of head attention as an input capsule for a capsule network to fuse the semantic and spatial information from different heads to help boost the correction of sentence output. As below:
The implementation code is here, and Pytorch issue is here.
I have got so bad results. Kindly, I need and suggestion to work on.
Hello all :)
I’m trying to use Routing by agreement with TRANSFORMER-BASED for NMT task. The proposed idea is to use each output of head attention as an input capsule for a capsule network to fuse the semantic and spatial information from different heads to help boost the correction of sentence output. As below:
The implementation code is here, and Pytorch issue is here.
I have got so bad results. Kindly, I need and suggestion to work on.
I look forward to your feedback.