Open AAndersn opened 2 months ago
Hi @AAndersn thanks for reporting. I was not able to repro this do far but I will give it another try later today. You're right about the int4, this is a left over from a back and forth while we created the PR for QLORA. Would you be interested in creating a PR to fix this?
@mreso Happy to make a PR to update the docs. I'll also try rolling back to an older version of PyTorch and update this issue tomorrow to see if that fixes it.
The problem appears to be an issue with AutoModel.from_pretrained()
inside the finetuning.py script.
I rebuilt my environment today with llama-recipes 0.0.4 and transformers 4.45.0 and am able to run this snippet successfully:
import torch
from transformers import BitsAndBytesConfig, AutoModel
bnb_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_quant_type="nf4",
bnb_4bit_compute_dtype=torch.bfloat16,
bnb_4bit_use_double_quant=True,
bnb_4bit_quant_storage=torch.bfloat16
)
model = AutoModel.from_pretrained(
"meta-llama/Meta-Llama-3.1-8B",
quantization_config=bnb_config,
device_map="auto",
torch_dtype=torch.bfloat16
)
However, if I copy and paste this exact snippet into finetuning.py, the AutoModel call fails with same message
python3.11/site-packages/bitsandbytes/nn/modules.py", line 149, in __new__
[rank3]: self = torch.Tensor._make_subclass(cls, data, requires_grad)
[rank3]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank3]: RuntimeError: Only Tensors of floating point and complex dtype can require gradients
Hi! @AAndersn Thanks for reporting this. I am wondering if changing AutoModel
to LlamaForCausalLM
solve this? Can you try? Thanks!
@wukaixingxp - Thank you so much! Changing AutoModel
to LLamaForCausalLM
fixed it! Testing now with 8B and 70B.
If that works, I will install the pytest suite and then update #681 to include this fix
I tried your command with transformers = 4.45.0
and torch = 2.4.1
. But I got this error```rank2: Traceback (most recent call last):
rank2: File "/home/kaiwu/miniconda3/envs/recipe_test/lib/python3.10/runpy.py", line 196, in _run_module_as_main
rank2: return _run_code(code, main_globals, None,
rank2: File "/home/kaiwu/miniconda3/envs/recipe_test/lib/python3.10/runpy.py", line 86, in _run_code
rank2: exec(code, run_globals)
rank2: File "/home/kaiwu/work/llama-recipes/src/llama_recipes/finetuning.py", line 332, in
rank2: File "/home/kaiwu/miniconda3/envs/recipe_test/lib/python3.10/site-packages/fire/core.py", line 143, in Fire rank2: component_trace = _Fire(component, args, parsed_flag_args, context, name) rank2: File "/home/kaiwu/miniconda3/envs/recipe_test/lib/python3.10/site-packages/fire/core.py", line 477, in _Fire rank2: component, remaining_args = _CallAndUpdateTrace( rank2: File "/home/kaiwu/miniconda3/envs/recipe_test/lib/python3.10/site-packages/fire/core.py", line 693, in _CallAndUpdateTrace rank2: component = fn(*varargs, **kwargs) rank2: File "/home/kaiwu/work/llama-recipes/src/llama_recipes/finetuning.py", line 203, in main rank2: model = FSDP( rank2: File "/home/kaiwu/miniconda3/envs/recipe_test/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 483, in init
rank2: File "/home/kaiwu/miniconda3/envs/recipe_test/lib/python3.10/site-packages/torch/distributed/fsdp/_wrap_utils.py", line 102, in _auto_wrap rank2: _recursive_wrap(recursive_wrap_kwargs, root_kwargs) # type: ignorearg-type: File "/home/kaiwu/miniconda3/envs/recipe_test/lib/python3.10/site-packages/torch/distributed/fsdp/wrap.py", line 544, in _recursive_wrap rank2: wrapped_child, num_wrapped_params = _recursive_wrap( rank2: File "/home/kaiwu/miniconda3/envs/recipe_test/lib/python3.10/site-packages/torch/distributed/fsdp/wrap.py", line 544, in _recursive_wrap rank2: wrapped_child, num_wrapped_params = _recursive_wrap( rank2: File "/home/kaiwu/miniconda3/envs/recipe_test/lib/python3.10/site-packages/torch/distributed/fsdp/wrap.py", line 544, in _recursive_wrap rank2: wrapped_child, num_wrapped_params = _recursive_wrap( rank2: Previous line repeated 2 more times: File "/home/kaiwu/miniconda3/envs/recipe_test/lib/python3.10/site-packages/torch/distributed/fsdp/wrap.py", line 562, in _recursive_wrap rank2: return _wrap(module, wrapper_cls, kwargs), nonwrapped_numel rank2: File "/home/kaiwu/miniconda3/envs/recipe_test/lib/python3.10/site-packages/torch/distributed/fsdp/wrap.py", line 491, in _wrap rank2: return wrapper_cls(module, kwargs) rank2: File "/home/kaiwu/miniconda3/envs/recipe_test/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 509, in init
rank2: File "/home/kaiwu/miniconda3/envs/recipe_test/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py", line 565, in _init_param_handle_from_module
rank2: File "/home/kaiwu/miniconda3/envs/recipe_test/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py", line 897, in _materialize_meta_module
rank2: raise e
rank2: File "/home/kaiwu/miniconda3/envs/recipe_test/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py", line 890, in _materialize_meta_module
rank2: module.reset_parameters() # type: ignoreoperator: File "/home/kaiwu/miniconda3/envs/recipe_test/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1729, in getattr
rank2: raise AttributeError(f"'{type(self).name}' object has no attribute '{name}'")
rank2: AttributeError: 'LlamaRMSNorm' object has no attribute 'reset_parameters'. Did you mean: 'get_parameter'?
/home/kaiwu/miniconda3/envs/recipe_test/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:892: UserWarning: Unable to call reset_parameters()
for module on meta device with error 'LlamaRMSNorm' object has no attribute 'reset_parameters'. Please ensure that your module oftype <class 'transformers.models.llama.modeling_llama.LlamaRMSNorm'> implements a reset_parameters()
method.```
pip reported a conflict with torch = 2.4.1
.
I was able to run 8B with 4bit quantization with torch = 2.4.0
by replacing the AutoModel
with LlamaForCausalLM
or LlamaForQuestionAnswering
(for use with a custom dataset).
@wukaixingxp - I see you have made that update in https://github.com/meta-llama/llama-recipes/blob/main/src/llama_recipes/finetuning.py#L139, so will close this issue as fixed by #686
Thanks so much for your help!
@wukaixingxp -- Today I pulled the latest llama-recipes (0.0.4.post) and am getting the same Only Tensors of floating point and complex dtype can require gradients
error again. I tried to bump transformers to 4.46.3 and pytorch to 2.5.1, but still no luck. If you can please try running my bash call for grammar or similar csv dataset when you have free time, I would appreciate it a lot.
FYI - I am still able to run an archived copy of your fixes from https://github.com/meta-llama/llama-recipes/commit/9c7a5b421f20b73511d9d0d49078824393e63faa with transformers 4.45.0 and PyTorch 2.4.0, so it's not stopping me for the time being.
System Info
PyTorch 2.4.0, Cuda 12.1, CentOS HPC cluster with 7x H100 GPUs
Information
🐛 Describe the bug
Error logs
This error message is then repeated by each separate GPU process, followed by
If the command is run without the
FSDP_CPU_RAM_EFFICIENT_LOADING=1 ACCELERATE_USE_FSDP=1
header, then it throws a different error:Expected behavior
This call and dataset work fine for llama3.1-8B without quantization, but fail with 4-bit quantization. The
int4
parameter specific given in https://github.com/meta-llama/llama-recipes/blob/main/recipes/quickstart/finetuning/multigpu_finetuning.md#with-fsdp--qlora does not exist.