kyegomez / Sophia

Effortless plugin and play Optimizer to cut model training costs by 50%. New optimizer that is 2x faster than Adam on LLMs.
https://discord.gg/qUtxnK2NMf
Apache License 2.0
377 stars 26 forks source link

Using Megatron Train GPT3 #21

Open Kingsleyandher opened 1 year ago

Kingsleyandher commented 1 year ago

Hello, there was an error when I used the Sophia optimizer to train GPT3 with Megatron. The error point is that grad cannot be substituted into the optimizer with require_grad = True state to calculate the second derivative. Do you know how to solve this problem?

File "/root/miniconda3/envs/torch18/lib/python3.7/site-packages/torch/autograd/__init__.py", line 277, in grad allow_unused, accumulate_grad=False) # Calls into the C++ engine to run the backward pass RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn.

Upvote & Fund

Fund with Polar

Kingsleyandher commented 1 year ago
class HutchinsonEstimator(HessianEstimator):
    def estimate(self, p, grad):
        u = torch.randn_like(grad)
        grad_dot_u = torch.sum(grad * u)
        print(f"grad_dot_u requires grad: {grad_dot_u.requires_grad}")   #  -> False

        # ↓  RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn.
        hessian_vector_product = torch.autograd.grad(    
            grad_dot_u, p, retain_graph=True)[0]
        return u * hessian_vector_product
Kingsleyandher commented 1 year ago

This problem same like #7 .

liuslnlp commented 1 year ago

Hello @Kingsleyandher , I meet the same question, is your problem solved?