nlesc-dirac / pytorch

Improved LBFGS and LBFGS-B optimizers in PyTorch.
Apache License 2.0
51 stars 4 forks source link

lbfgsnew throws error #2

Closed shamsbasir closed 1 year ago

shamsbasir commented 1 year ago

Hi @SarodYatawatta ,

I was experimenting with your code to train a simple regression model. But I get this error.

RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn

This is my optimizer LBFGSNew(model.parameters(), history_size=7, max_iter=2, line_search_fn=True, batch_mode=True) Inside my closure I have this if loss.requires_grad: loss.backward(create_graph=True) # loss.backward() return loss Could you please point out what is going wrong here? Thanks

SarodYatawatta commented 1 year ago

have you tried adding retain_graph=True to loss.backward() ? Also it could be that your model has some tensors that are not created with requires_grad=True

shamsbasir commented 1 year ago

My model works with the lbfgs optimizer native to PyTorch.

I keep my optimizer as follows

LBFGSNew(model.parameters(), history_size=7, max_iter=2, line_search_fn=True, batch_mode=True) and made the following changes as you suggested.

if loss.requires_grad: loss.backward(retain_graph=True) return loss and I get the following error RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn As I mentioned I do not get this error with the LBFGS optimizer that is built-in PyTorch. Also, my objective function requires several derivatives of the model prediction with respect to the inputs. That is where the error happens with lbfgsnew

SarodYatawatta commented 1 year ago

OK, could you post the full backtrace of the error? thanks

flavio-martinelli commented 1 year ago

Hello! Thanks for providing a batched version of LBFGS, it works very smoothly! I think I am getting the same error as @shamsbasir, I paste you the error trace below. I managed to fix it by removing line 683 but do not fully understand why it works.

Traceback (most recent call last):
  File "/Users/flmartin/.virtualenvs/pytorch/lib/python3.9/site-packages/torch/optim/optimizer.py", line 88, in wrapper
    return func(*args, **kwargs)

  File "/Users/flmartin/Documents/***/***/src/lbfgsnew.py", line 685, in step
    t = self._linesearch_cubic(closure, d, 1e-6)

  File "/Users/flmartin/Documents/***/***/src/lbfgsnew.py", line 227, in _linesearch_cubic
    phi_0 = float(closure())

  File "/Users/flmartin/Documents/***/***/scripts/myscript.py", line 71, in closure
    loss.backward()

  File "/Users/flmartin/.virtualenvs/pytorch/lib/python3.9/site-packages/torch/_tensor.py", line 307, in backward
    torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs)

  File "/Users/flmartin/.virtualenvs/pytorch/lib/python3.9/site-packages/torch/autograd/__init__.py", line 154, in backward
    Variable._execution_engine.run_backward(

RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn
SarodYatawatta commented 1 year ago

Could you provide a small test case for me to reproduce the error? thanks

SarodYatawatta commented 1 year ago

If you can provide the closure() code that will be enough. Where do you call loss.backward() in the closure()?

flavio-martinelli commented 1 year ago

My closure is very minimal, it might not help you:

        def closure():
            optimizer.zero_grad()
            loss, _ = loss_fn(y, students(X))
            loss *= 100
            print(loss)
            loss.backward()
            return loss

It would take me some time to produce a MWE, but I can provide it if you need

SarodYatawatta commented 1 year ago

This problem occurs because the gradient itself is part of the cost to minimize. I have added option cost_use_gradient=True to handle this case, but will increase the computational cost.