I am trying to understand the code for WikiSum. I understand that before training we need to download the data and then process it by using the extractive method. However, I am not sure why we are training the wikisum abstractive summarization using a "transformer" model as specified on the README. How do I know that if only the decoder is used or the encoder is also used? And where can I find the implementation for T-DMCA (Transformer Decoder with Memory-Compressed Attention)? Thank you for any help!
I am trying to understand the code for WikiSum. I understand that before training we need to download the data and then process it by using the extractive method. However, I am not sure why we are training the wikisum abstractive summarization using a "transformer" model as specified on the README. How do I know that if only the decoder is used or the encoder is also used? And where can I find the implementation for T-DMCA (Transformer Decoder with Memory-Compressed Attention)? Thank you for any help!