Hi,
I was very impressed with this repository when I was doing the research about applying transformer(attention is all you need) to the task of time series forecasting.
I have a question. The implementation of "attention is all you need" has encoder and decoder mechanism to deal with the translation task. However your implementation in this repository has only encoder mechanism. I wondered if there are any other papers or articles you referenced. If there is, could you please tell me?
Also, I want to ask you one more question, why your implementation has fewer dropout layers unlike original implementation in "attention is all you need"
Hi, I was very impressed with this repository when I was doing the research about applying transformer(attention is all you need) to the task of time series forecasting. I have a question. The implementation of "attention is all you need" has encoder and decoder mechanism to deal with the translation task. However your implementation in this repository has only encoder mechanism. I wondered if there are any other papers or articles you referenced. If there is, could you please tell me? Also, I want to ask you one more question, why your implementation has fewer dropout layers unlike original implementation in "attention is all you need"