pytorch / captum

Model interpretability and understanding for PyTorch
https://captum.ai
BSD 3-Clause "New" or "Revised" License
4.72k stars 476 forks source link

[IntegratedGradients] Memory leaks when calculating IG and LIG #866

Open roma-glushko opened 2 years ago

roma-glushko commented 2 years ago

🐛 Bug

I'm experiencing RAM leaks when calculating word attributions through https://github.com/cdpierse/transformers-interpret library which delegates the most of the heavy lifting to Captum. So I assume this issue is relevant to Captum.

To Reproduce

My setup is described in detail in the following issue: https://github.com/cdpierse/transformers-interpret/issues/78

Expected behavior

Every time the model gets a request that should contain interpretability information, Captum calculates IG/LIG using some amount of RAM and then clean all the used RAM once that is done, so at the end of the request processing we have almost the same amount of memory used by the model service.

Environment


 - Captum / PyTorch Version (e.g., 1.0 / 0.4.0): 0.4.0/1.9.1
 - OS: registry.access.redhat.com/ubi8/python-39:latest docker image
 - How you installed Captum / PyTorch: via Poetry
 - Python version: 3.9
 - CUDA/cuDNN version: N/A, running on CPU
NarineK commented 2 years ago

@roma-glushko, this is interesting because LIG and IG are stateless, there shouldn't be any memory leak. Have you tried to use another layer method such as captum.attr.LayerActivation ? Do you see similar issue ?

roma-glushko commented 2 years ago

Hey @NarineK, thank you for the replay! Unfortunately, I have no idea about that. Maybe @cdpierse has any.

In any case, I could track it down on the python side. My gut feeling that the issue goes beyond the Python realm and may lay in the C/C++ level. Although I have no direct evidence of that, other than I could not find any gradient leaks debugging Python codebase.

jakobamb commented 2 years ago

Hi @NarineK, any news on this? I am experiencing similar issues with a transformer model. Are you planning on looking into this? I could try to create a minimal example if this helps.

NarineK commented 2 years ago

@jakobamb if you could send me a minimal example that will help me with the debugging. Thank you1