So, what should i do if i want to implement a Bidirectional GRU encoder with attention?
I have done like:
`self.bidir = BidirectionalWrap(GatedRecurrent(activation=Tanh(), dim=state_dim))
self.fwd_fork = Fork( [name for name in self.bidir.prototype.apply.sequences if name != 'mask'], prototype=Linear(), name='fwd_fork')
self.back_fork = Fork( [name for name in self.bidir.prototype.apply.sequences if name != 'mask'], prototype=Linear(), name='back_fork')
So, what should i do if i want to implement a Bidirectional GRU encoder with attention? I have done like:
`self.bidir = BidirectionalWrap(GatedRecurrent(activation=Tanh(), dim=state_dim)) self.fwd_fork = Fork( [name for name in self.bidir.prototype.apply.sequences if name != 'mask'], prototype=Linear(), name='fwd_fork') self.back_fork = Fork( [name for name in self.bidir.prototype.apply.sequences if name != 'mask'], prototype=Linear(), name='back_fork')
but feels wrong and not working, any suggestions?