Open felipemello1 opened 1 day ago
1) I tried using this https://github.com/pytorch/torchtune/blob/main/recipes/configs/mistral/7B_full_ppo_low_memory.yaml and gave up after 10min no first step, even after turning compile=False
2) we shouldn't be compiling the entire model. Instead, we should use the utility that compiles per layer, like here: https://github.com/pytorch/torchtune/blob/e9fd56a812cf0ba151fa164a45eb04056d099726/recipes/lora_dpo_single_device.py#L296
3) DPO distributed doesn't compile the model. Should it?
cc: @SalmanMohammadi
1) I tried using this https://github.com/pytorch/torchtune/blob/main/recipes/configs/mistral/7B_full_ppo_low_memory.yaml and gave up after 10min no first step, even after turning compile=False
2) we shouldn't be compiling the entire model. Instead, we should use the utility that compiles per layer, like here: https://github.com/pytorch/torchtune/blob/e9fd56a812cf0ba151fa164a45eb04056d099726/recipes/lora_dpo_single_device.py#L296
3) DPO distributed doesn't compile the model. Should it?
cc: @SalmanMohammadi