Realized that just adding tensors probably is erasing some of the information learned.
By concatenating all the information learned for a given task in a given stack, then re-encoding to the next stack's shape, we ensure all bits of information are maintained until the end of each stack.
Another way of doing this would have been to expand the cnn_dim for each stack.
While that probably would have an affect on results, I'll stick to re-encoding rn since others have used this before.
This alternative can be a part of future work.
Realized that just adding tensors probably is erasing some of the information learned. By concatenating all the information learned for a given task in a given stack, then re-encoding to the next stack's shape, we ensure all bits of information are maintained until the end of each stack.
Another way of doing this would have been to expand the cnn_dim for each stack. While that probably would have an affect on results, I'll stick to re-encoding rn since others have used this before. This alternative can be a part of future work.