Increasing memory usage of DependencyCRF

harvardnlp / pytorch-struct

Fast, general, and tested differentiable structured prediction in PyTorch

http://harvardnlp.github.io/pytorch-struct

MIT License

1.11k stars 93 forks source link

Increasing memory usage of DependencyCRF #80

Open kmkurn opened 4 years ago

kmkurn commented 4 years ago

Running the piece of code below multiple times (with CUDA_VISIBLE_DEVICES is set to a single GPU id)

_ = DependencyCRF(torch.zeros(5,5,5).cuda(), multiroot=False).marginals
print(torch.cuda.memory_allocated())

will result in increasing allocated CUDA memory, e.g. 43520, 44544, 45568, so on. The same thing happens with .partition, but doesn't happen with NonProjectiveDependencyCRF where the memory usage is constant. Is this expected?

srush commented 4 years ago

Sorry is that a lot? Sometimes memory will stick around in pytorch until you garbage collect. Maybe try this tool https://pypi.org/project/pytorch-memlab/ ?

Non-projective code uses a completely different algorithm so it might be difference.

kmkurn commented 4 years ago

With the actual data I’m using, yes it becomes much larger. I keep getting OOM, and this is at inference time where I use torch.no_grad(). The high memory consumption seems strange to me since the computation graph shouldn’t be created.

But I noticed that the marginals computation seems to use autograd. That may be the cause? The marginals always has requires_grad=True even when the arc scores tensor doesn’t.

Thanks for the suggestion. I’m using smaller data for now so it works fine 😃 Feel free to close this issue if you want.

srush commented 4 years ago

Yes the library uses autograd internally to create marginals. That should be invisible to the user, but it is possible that it will cause issues. Projective does use more memory because of this.

LouChao98 commented 4 years ago

At deptree.py line 123, add 'arc_scores = arc_scores.clone()' after the convert and before the for-loop.

This may be helpful, but I do not know why this works.

I add this line because when I backward(), I get a "leaf variable has been moved into the graph interior" error.