pytorch / botorch

Bayesian optimization in PyTorch
https://botorch.org/
MIT License
3.11k stars 404 forks source link

Parallelize evaluations of forward model #344

Closed pabloprf closed 2 years ago

pabloprf commented 4 years ago

I have implemented a genetic algorithm that calls forward evaluations of a model that I have fitted previously (in particular, a FixedNoiseGP). However, I am having problems parallelizing it with multiprocessing.

I have tried to use multiprocessing.Pool(), but then I get an error with pickling certain torch functions: Can't pickle <built-in function softflus>: import of module 'torch._C._nn' failed

I have also tried with multiprocessing.Process(), but it hangs up in the forward evaluation of the FixedNoiseGP. Interestingly, if I write a print command inside the forward method of FixedNoiseGP, I can see that the MultivariateNormal is indeed being evaluated, but for some reason not passed to the process.

Any idea of how to solve this? Or other options to use a botorch model inside a parallel framework?

eytan commented 4 years ago

Why do you need multiprocessing? BoTorch / GPyTorch already utilizes parallel processing via MKL or GPU libraries. My understanding is that in general PyTorch does not play well with multiprocessing. if you are trying to do many function evaluations in parallel you may want to take a look at the CMA-ES tutorial, if you haven’t already. https://botorch.org/tutorials/optimize_with_cmaes

e

Sent from my iPhone

On Dec 23, 2019, at 2:56 PM, Pablo Rodriguez-Fernandez notifications@github.com wrote:

 I have implemented a genetic algorithm that calls forward evaluations of a model that I have fitted previously (in particular, a FixedNoiseGP). However, I am having problems parallelizing it with multiprocessing.

I have tried to use multiprocessing.Pool(), but then I get an error with pickling certain torch functions: Can't pickle : import of module 'torch._C._nn' failed

I have also tried with multiprocessing.Process(), but it hangs up in the forward evaluation of the FixedNoiseGP. Interestingly, if I write a print command inside the forward method of FixedNoiseGP, I can see that the MultivariateNormal is indeed being evaluated, but for some reason not passed to the process.

Any idea of how to solve this? Or other options to use a botorch model inside a parallel framework?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or unsubscribe.

pabloprf commented 4 years ago

Thanks for the reply. My algorithm does indeed send a batch of evaluations that take advantage of the parallel processing that BoTorch has. However, what I am looking for is to run several optimization algorithms in parallel. For example, several independent optimizations with CMA-ES on the same model. This is useful in practice when you have a heuristic method like genetic algorithms and you need to run several of them changing parameters.

I provide here an example of my problem. If I use the same scripts as on the BoTorch tutorial (https://botorch.org/tutorials/fit_model_with_torch_optimizer):

import math
import torch
import numpy as np

# use a GPU if available
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
dtype = torch.float

# use regular spaced points on the interval [0, 1]
train_X = torch.linspace(0, 1, 15, dtype=dtype, device=device)
# training data needs to be explicitly multi-dimensional
train_X = train_X.unsqueeze(1)

# sample observed values and add some synthetic noise
train_Y = torch.sin(train_X * (2 * math.pi)) + 0.15 * torch.randn_like(train_X)

from botorch.models import SingleTaskGP
from gpytorch.constraints import GreaterThan

model = SingleTaskGP(train_X=train_X, train_Y=train_Y)
model.likelihood.noise_covar.register_constraint("raw_noise", GreaterThan(1e-5))

from gpytorch.mlls import ExactMarginalLogLikelihood

mll = ExactMarginalLogLikelihood(likelihood=model.likelihood, model=model)
# set mll and all submodules to the specified dtype and device
mll = mll.to(train_X)

from torch.optim import SGD

optimizer = SGD([{'params': model.parameters()}], lr=0.1)

NUM_EPOCHS = 150

model.train()

for epoch in range(NUM_EPOCHS):
    # clear gradients
    optimizer.zero_grad()
    # forward pass through the model to obtain the output MultivariateNormal
    output = model(train_X)
    # Compute negative marginal log likelihood
    loss = - mll(output, model.train_targets)
    # back prop gradients
    loss.backward()
    # print every 10 iterations
    if (epoch + 1) % 10 == 0:
        print(
            f"Epoch {epoch+1:>3}/{NUM_EPOCHS} - Loss: {loss.item():>4.3f} "
            f"lengthscale: {model.covar_module.base_kernel.lengthscale.item():>4.3f} " 
            f"noise: {model.likelihood.noise.item():>4.3f}" 
         )
    optimizer.step()

# set model (and likelihood)
model.eval();

I can do quick evaluations like:

x = torch.from_numpy(np.expand_dims([0.5], axis=1)).float()
print(model(x))

which gives me MultivariateNormal(loc: tensor([0.0027], grad_fn=<ViewBackward>))

However, to parallelize evaluations like that one, this does not work:

import torch.multiprocessing as multiprocessing
def funcParallel(x):
    print(x)
    x = torch.from_numpy(np.expand_dims([x], axis=1)).float()
    print(model(x))

processes = []

X = [[0.0],[0.5]]

for i,x in enumerate(X):
    p = multiprocessing.Process(target=funcParallel,args=(x,))
    p.start()
    processes.append(p)
for p in processes: p.join()

because it just waits forever to evaluate model(x)

Hopefully you can reproduce the same behavior. This is just a quick example. This could be solved in this case by just sending all the points together with x = torch.from_numpy(np.expand_dims([X], axis=1)).float(); print(model(x)). However, I cannot do that in the real case, because the evaluations belong to different optimization workflows.

Balandat commented 4 years ago

I haven't done much multiprocessing with torch, but I assume the same gotchas (and more) as in regular python apply, which is probably what is happening here. I usually use the Pool approach when doing multiprocessing in python, so I'd try to figure out the pickling error as a first step. This may be related to https://github.com/cornellius-gp/gpytorch/issues/907

saitcakmak commented 2 years ago

Closing this since it has been inactive for 2+ years