CUDA managed memory is being assigned for all input sequences at once, which leads to memory/swap trashing and major slowdown. Code change needed to chunk up the input sequences so all data in a chunk being worked on actively is likely fitting in the GPU RAM.
CUDA managed memory is being assigned for all input sequences at once, which leads to memory/swap trashing and major slowdown. Code change needed to chunk up the input sequences so all data in a chunk being worked on actively is likely fitting in the GPU RAM.