Error when running run_alignment.py: has no attribute when trying to save rotation matrix.

rosafish commented 1 year ago

Hi, I am working with your newest version of the repo and got the tutorial.ipynb to work. However, when I run run_alignment.py with the training script at the end at your README.md, I run into the following error as the model tries to save checkpoints of the rotation layer:

Traceback (most recent call last):
  File "/net/scratch/zhouy1/github/align-transformers-forked/run_alignment.py", line 183, in <module>
    aligner.train(
  File "/net/scratch/zhouy1/github/align-transformers-forked/trainer.py", line 222, in train
    self.save_model(output_dir, 'pytorch-rotate-best.bin')
  File "/net/scratch/zhouy1/github/align-transformers-forked/trainer.py", line 62, in save_model
    'rotate_layer': self.model.module.model.rotate_layer.state_dict(),
  File "/home/zhouy1/miniconda3/envs/BoundlessDAS/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1269, in __getattr__
    raise AttributeError("'{}' object has no attribute '{}'".format(
AttributeError: 'AlignableLlamaForCausalLM' object has no attribute 'module'

Thanks!

frankaging commented 1 year ago

hey! thanks for reporting this issue, it seems like there is a multi_gpu env setup issue, and we need a larger PR to update that.

in the meantime, you can also add this CUDA_VISIBLE_DEVICES= before your python run_alignment.py to unblock you to run experiments.

the temporary change we need is at (this is just a hacky workaround that is safe)

https://github.com/frankaging/align-transformers/blob/main/run_alignment.py#L172C11-L172C37

and set number of gpu to be 1 as currently i only tested the script with a single >40G GPU for alignment search. could you change that? and verify? and once you've done the verification, feel free to open a pull request and i will merge it. please also put a comment saying only supporting a single gpu alignment rn. thanks!

frankaging commented 1 year ago

fixing with the recent commit to ToT. closing the issue.

stanfordnlp / pyvene

Error when running run_alignment.py: has no attribute when trying to save rotation matrix. #5