rfeinman / pytorch-minimize

Newton and Quasi-Newton optimization with PyTorch
https://pytorch-minimize.readthedocs.io
MIT License
308 stars 34 forks source link

Multiple GPUs #17

Closed yubobao27 closed 1 year ago

yubobao27 commented 2 years ago

Can multiple gpus be utilized? Since input or model cannot be parallelized? Running trust_ncg on a model with mult-million rows input exceeds single GPU memory.

rfeinman commented 2 years ago

@yubobao27 Thanks for your suggestion. I'd be open to a PR but unfortunately I don't have the time to design/implement this myself right now.

yubobao27 commented 2 years ago

Sure. How would one go about it if it can be done? If you describe a way maybe we can contribute.

rfeinman commented 1 year ago

Hi @yubobao27 - I think the best place to implement this is actually on the user side. Users can build GPU parallelism into their objective function in different ways.

For example, let's say your objective is to minimize 2-norm in the feature space of a neural net, and your input x is a minibatch:

import torch
import torch.nn as nn
from torchmin import minimize

net = nn.Sequential(
    nn.Linear(200, 512),
    nn.Tanh(),
    nn.Linear(512, 512)
).cuda()

x0 = torch.randn(80, 200).cuda()

def obj(x):
    y = net(x)
    return y.norm(dim=1).mean()

result = minimize(obj, x0, method='bfgs')

To parallelize computations across the minibatch, users can make a simple modification:

def obj(x):
    y = nn.parallel.data_parallel(net, x)
    return y.norm(dim=1).mean()

If you have other thoughts, or other use cases in mind, I'd be curious to hear.