jzhang38 / EasyContext

Memory optimization and training recipes to extrapolate language models' context length to 1 million tokens, with minimal hardware.
Apache License 2.0
616 stars 42 forks source link

Can we just use the sloth gradient checkpointing by uncommenting this line? #30

Open vkaul11 opened 4 months ago

vkaul11 commented 4 months ago

I was not clear about how to use the code ? https://github.com/jzhang38/EasyContext/blob/main/train.py#L28 By uncommenting this line we can enable sloth code?

jzhang38 commented 4 months ago

Yes you can. It will produce the same loss. But it does not enable greater batch size in my experiments.

vkaul11 commented 4 months ago

I am getting this error though when I do this. Any idea why ? File "/workspace/cookbook-internal/recipes/common/peft.py", line 89, in load_train_model model = prepare_model_for_kbit_training(model) File "/usr/local/lib/python3.10/dist-packages/peft/utils/other.py", line 137, in prepare_model_for_kbit_training model.gradient_checkpointing_enable(**gc_enable_kwargs) File "/workspace/cookbook-internal/recipes/common/sloth_activation.py", line 63, in new_gradient_checkpointing_enable assert gradient_checkpointing_kwargs == None AssertionError Maybe using QLora instead of Lora complicates things?

vkaul11 commented 4 months ago

I need it reduce memory footprint not batch size

vkaul11 commented 4 months ago

A question assert gradient_checkpointing_kwargs == None is there which throws an error. Do I need to set gradient_checkpointing_kwargs to something or I need to comment this line?