pytorch / pytorch

Tensors and Dynamic neural networks in Python with strong GPU acceleration
https://pytorch.org
Other
82.66k stars 22.25k forks source link

[feature request] keep gradcheck numerical jacobian from being depeciated #120107

Open ConnorStoneAstro opened 7 months ago

ConnorStoneAstro commented 7 months ago

Issue description

Computing Jacobians for large/complex calculations can produce computational graphs too large to fit in memory, however a finite difference Jacobian does not need to store the graph in memory and so can be used on larger models.

In my case I am using PyTorch to solve parametric models on Astronomical images using Levenberg-Marquardt. I have been using the autograd jacobian, however I apply several tricks to ensure the Jacobian never gets too large in memory. I just found the numerical Jacobian in the API and now I learn it is being depreciated! It would be great if it could be kept, even in its current mostly private form that would be useful to me.

Here is the line which says the numerical Jacobian will be depreciated: https://github.com/pytorch/pytorch/blob/4319735ace2fe00ce720db00119e08d56d6c344c/torch/autograd/gradcheck.py#L328

Also, in some cases a numerical Jacobian is actually faster than autograd and so would provide some acceleration.

Code example

Here is my Repo where I use PyTorch for astronomical images: https://github.com/Autostronomy/AstroPhot

If you run it on a large enough image it will crash, though I have several features in the code which break up images until they are small enough to run without crashing it would be great to not need these.

cc @ezyang @albanD @zou3519 @gqchen @pearu @nikitaved @soulitzer @Lezcano @Varal7

soulitzer commented 7 months ago

The code for numerical Jacobian is in pure-python and quite self-contained. If you find it useful, one way forward is just to copy-paste the logic.

Also, a way to avoid storing all activations in memory is to use forward AD to compute the Jacobian instead. Does something like https://pytorch.org/docs/stable/generated/torch.func.jacfwd.html#torch.func.jacfwd work for you?

ConnorStoneAstro commented 7 months ago

@soulitzer Ah, you're right I can probably just copy it for my own use, thanks! I have been using Jacfwd, it's the only way I could do most of my calculations, but sometime I have individual intermediate steps which require producing very large tensors that can blow up the memory usage even if the final object is in a reasonable range. Having this as an option will be really helpful :)