AttributeError: 'GCTrainer' object has no attribute 'scaler'

luyug / GradCache

Run Effective Large Batch Contrastive Learning Beyond GPU/TPU Memory Constraint

Apache License 2.0

327 stars 19 forks source link

AttributeError: 'GCTrainer' object has no attribute 'scaler' #11

Closed ToluClassics closed 1 year ago

ToluClassics commented 2 years ago

Hi @luyug, any idea on how to fix this?

04/14/2022 15:48:04 - INFO - tevatron.trainer - Initializing Gradient Cache Trainer Traceback (most recent call last): File "/home/odunayo/anaconda3/envs/tevatron_env/lib/python3.9/runpy.py", line 197, in _run_module_as_main return _run_code(code, main_globals, None, File "/home/odunayo/anaconda3/envs/tevatron_env/lib/python3.9/runpy.py", line 87, in _run_code exec(code, run_globals) File "/lustre07/scratch/odunayo/tevatron/src/tevatron/driver/train.py", line 103, in main() File "/lustre07/scratch/odunayo/tevatron/src/tevatron/driver/train.py", line 84, in main trainer = trainer_cls( File "/lustre07/scratch/odunayo/tevatron/src/tevatron/trainer.py", line 105, in init scaler=self.scaler AttributeError: 'GCTrainer' object has no attribute 'scaler'

ToluClassics commented 2 years ago

I guess this is a Tevatron issue

jordane95 commented 1 year ago

Hi @ToluClassics , I'm encountering the same problem. How did you solve it?

ToluClassics commented 1 year ago

Hi @jordane95 , check below issue if it solves your problem?

https://github.com/texttron/tevatron/issues/35#issuecomment-1080840310

jordane95 commented 1 year ago

Hi @jordane95 , check below issue if it solves your problem?

texttron/tevatron#35 (comment)

Unfortunately no. I'm using deepspeed. I think there may be some conflit with this repo...

luyug commented 1 year ago

@jordane95 deepspeed has its gradient accumulation opaquely implemented in C/C++ codes. technically, it is possible to work it out by carefully aligning all the gradcache's sub batch gradient computation with deepspeed engine's gradient accumulated backwards. personally i have not done it before but instead switched to use native torch / fairscale. is there a particular reason to use deepspeed in your case?