Closed KKNakkav2 closed 3 months ago
Facing exact similar issue while training distilgpt2 for causal LM
Update: Resolved for me, the position encoding (wpe) parameter's grad_sampler was getting a batch size of [1], its cuz they are same for all inputs in batch, and are single array rather than a batch_size x seq_len matrix, so i simply passed it as input to model and it worked. @KKNakkav2 you can try this and see if it works for you
Yeah, as @scakc mentioned, I do not think Opacus right now supports gpt based models. You can either follow the instruction from @scakc , or use Xuechen's fix (https://github.com/lxuechen/private-transformers)
I am finetuning the BERT-LLM with Opacus wrapper and encountered an issue inside the opacus optimizer. Can you please advise on the next steps for resolving the error.
Error: