what's is the inputs of network

Hi, @luweishuang thank you for you concern. In training procress, inputs including Text (X) and Shifted Signals (Shifted Y), while in inference process, input is only Text (X), but we will predict one frame at each step like RNN using attentioned-X and already predicted Y. This great idea was proposed by "Attention is all you need, https://arxiv.org/pdf/1706.03762.pdf", in this paper, the author wrote "Most competitive neural sequence transduction models have an encoder-decoder structure. Here, the encoder maps an input sequence of symbol representations (x1, ..., xn) to a sequence of continuous representations z = (z1, ..., zn). Given z, the decoder then generates an output sequence (y1, ..., ym) of symbols one element at a time. At each step the model is auto-regressive , consuming the previously generated symbols as additional input when generating the next". Also you can visit my previous project "https://github.com/FonzieTree/Attention-is-all-you-need" to figure out how to do DNN-based attention only using numpy. I wish my answer could help you. Soon, I will write inference function for Deep-Expression when I am not that busy.

ttsunion / Deep-Expression

what's is the inputs of network #4