Closed ToluClassics closed 1 year ago
I guess this is a Tevatron issue
Hi @ToluClassics , I'm encountering the same problem. How did you solve it?
Hi @jordane95 , check below issue if it solves your problem?
https://github.com/texttron/tevatron/issues/35#issuecomment-1080840310
Hi @jordane95 , check below issue if it solves your problem?
Unfortunately no. I'm using deepspeed. I think there may be some conflit with this repo...
@jordane95 deepspeed
has its gradient accumulation opaquely implemented in C/C++ codes. technically, it is possible to work it out by carefully aligning all the gradcache
's sub batch gradient computation with deepspeed
engine's gradient accumulated backwards. personally i have not done it before but instead switched to use native torch
/ fairscale
. is there a particular reason to use deepspeed
in your case?
Hi @luyug, any idea on how to fix this?
04/14/2022 15:48:04 - INFO - tevatron.trainer - Initializing Gradient Cache Trainer Traceback (most recent call last): File "/home/odunayo/anaconda3/envs/tevatron_env/lib/python3.9/runpy.py", line 197, in _run_module_as_main return _run_code(code, main_globals, None, File "/home/odunayo/anaconda3/envs/tevatron_env/lib/python3.9/runpy.py", line 87, in _run_code exec(code, run_globals) File "/lustre07/scratch/odunayo/tevatron/src/tevatron/driver/train.py", line 103, in
main()
File "/lustre07/scratch/odunayo/tevatron/src/tevatron/driver/train.py", line 84, in main
trainer = trainer_cls(
File "/lustre07/scratch/odunayo/tevatron/src/tevatron/trainer.py", line 105, in init
scaler=self.scaler
AttributeError: 'GCTrainer' object has no attribute 'scaler'