Closed rosafish closed 1 year ago
hey! thanks for reporting this issue, it seems like there is a multi_gpu env setup issue, and we need a larger PR to update that.
in the meantime, you can also add this CUDA_VISIBLE_DEVICES=python run_alignment.py
to unblock you to run experiments.
the temporary change we need is at (this is just a hacky workaround that is safe)
https://github.com/frankaging/align-transformers/blob/main/run_alignment.py#L172C11-L172C37
and set number of gpu to be 1 as currently i only tested the script with a single >40G GPU for alignment search. could you change that? and verify? and once you've done the verification, feel free to open a pull request and i will merge it. please also put a comment saying only supporting a single gpu alignment rn. thanks!
fixing with the recent commit to ToT. closing the issue.
Hi, I am working with your newest version of the repo and got the tutorial.ipynb to work. However, when I run run_alignment.py with the training script at the end at your README.md, I run into the following error as the model tries to save checkpoints of the rotation layer:
Thanks!