Closed felipemello1 closed 4 weeks ago
When using grad_accumulation or optimizer_in_backward AND clip_grad_norm=None in the cli, it raises the error:
raise RuntimeError( "Gradient clipping is not supported with optimizer in bwd." "Please set clip_grad_norm=None, or optimizer_in_bwd=False." )
I believe that "None" is being interpreted as a string by the CLI
When using grad_accumulation or optimizer_in_backward AND clip_grad_norm=None in the cli, it raises the error:
I believe that "None" is being interpreted as a string by the CLI