Also bypassing this and materialising each prediction in the else branch for each step the PAOT branch it is really consuming a lot of memory.
Any hint on why it has a so large overhead if we compare with the main branch also when limiting the max long term mem?
The append approach is going to require a lot of memory: https://github.com/yoxu515/aot-benchmark/blob/04fe7d9faa4fe3f46ed7404cb78eb8a753621619/networks/managers/evaluator.py#L402
Also bypassing this and materialising each prediction in the else branch for each step the PAOT branch it is really consuming a lot of memory. Any hint on why it has a so large overhead if we compare with the main branch also when limiting the max long term mem?