pytorch / captum

Model interpretability and understanding for PyTorch
https://captum.ai
BSD 3-Clause "New" or "Revised" License
4.92k stars 497 forks source link

CUDA out of memory : for n_steps = 200 in Integrated Gradients #538

Closed parashar-gaurav closed 3 years ago

parashar-gaurav commented 3 years ago

For a pretrained Densenet model ( https://github.com/thtang/CheXNet-with-localization/tree/master/model) on NIH chest X-Ray dataset (https://www.kaggle.com/nih-chest-xrays/data) Cuda out of memory error is being shown, for n_steps = 200.

Error : CUDA out of memory. Tried to allocate 174.00 MiB (GPU 0; 14.73 GiB total capacity; 13.51 GiB already allocated; 43.88 MiB free; 13.75 GiB reserved in total by PyTorch)

NarineK commented 3 years ago

@parashar-gaurav, try to set internal_batch_size to a small number. If you set internal_batch_size to a small number it will run the algorithm on small internal_batch_size pieces and aggregate the results. This will help to avoid OOM situations.

NarineK commented 3 years ago

@parashar-gaurav, if this issue got solved can we close it ?

parashar-gaurav commented 3 years ago

Yes, you can close the issue.

vivekmig commented 3 years ago

Hi @lutaodai , this function seems equivalent to internal_batch_size , did you encounter any issues with internal_batch_size for your use-case? internal_batch_size should work fine with batch size = 1 as well, since it splits the expanded input batch (of dimension n_steps) into chunks of internal_batch_size.

lutaodai commented 3 years ago

Hi @vivekmig, sorry I did not understand the internal_batch_size correctly. After checking the source code of _batch_attribution and conducting some experiments, from which I found both implementations consumed similar memory, had comparable runtime and produced the same result, I confirm my implementation is indeed equivalent to the official implementation.

talhaanwarch commented 3 years ago

@lutaodai if you are following tutorials such as Interpreting vision with ResNet you have to add internal_batch_size=1

integrated_gradients = IntegratedGradients(model)
attributions_ig = integrated_gradients.attribute(input, target=pred_label_idx,internal_batch_size=1, n_steps=200)