Closed annanfree closed 2 months ago
Hi @annanfree, The decoder needs to have a history of what it has already generated. The intuition was to use a network that had a memory, but did not have the limitation of LSTMs. Regular Transformer Decoder did not have a proper performance because of the alignment issue, and so , we introduced the aligned Transformer Decoder. I highly suggest you to read the sciencedirect paper meticulously.
why use transformer decoder for slots filling,why not something more simple. what's the reason or intuition for using the structure