Closed seungwoos closed 8 months ago
Hi @seungwoos ! Thanks for the issue! Can you send me a small reproducer to repro the issue ? 🙏 I 'll have a look
@younesbelkada Thanks your for reply. I'm working with axolotl, and this example arose the same issue. While working on new virtual environment with latest transformers package, meaning creating quantization parameters is replaced with hf_quantizer, the error raised.
Would you mind check the repository and run the qlora as follows?
accelerate launch -m axolotl.cli.train examples/mistral/qlora.yml --deepspeed deepspeed_configs/zero2.json
Hmm, I guess the problem arose when applying quantization on multi-GPU with mixed accelerate configs.
distributed_type: FSDP
while I was trying to use Deepspeed. Removing FSDP from the default config, it works fine for me. (I guess this is the reason why these lines are called @seungwoos I am seeing a similar issue when using FSDP. After some debugging, this is the problematic code: link, which misses the required arg unexpected_keys
as defined here. My accelerate config is just a normal FSDP config:
compute_environment: LOCAL_MACHINE
debug: false
distributed_type: FSDP
downcast_bf16: 'no'
fsdp_config:
fsdp_auto_wrap_policy: TRANSFORMER_BASED_WRAP
fsdp_backward_prefetch: BACKWARD_PRE
fsdp_cpu_ram_efficient_loading: true
fsdp_forward_prefetch: false
fsdp_offload_params: true
fsdp_sharding_strategy: FULL_SHARD
fsdp_state_dict_type: SHARDED_STATE_DICT
fsdp_sync_module_states: true
fsdp_use_orig_params: true
machine_rank: 0
main_training_function: main
mixed_precision: bf16
num_machines: 1
num_processes: 4
rdzv_backend: static
same_network: true
tpu_env: []
tpu_use_cluster: false
tpu_use_sudo: false
use_cpu: false
Could you help take another look? Now every FSDP workflow is broken on my side when launched with accelerator in multi-GPU scenario, thank you!
Transformer version: 4.38
@chenmoneygithub
Hi, I guess the current implementation in hf_quantizer does not properly support FSDP yet.
Not only the unexpected_keys
argument is missing in hf_quantizer.create_quantized_param()
, it seems there is no condition to check whether the parameter is bnb linear component or not.
If it is an emergent case, I checked changing this line as
from .integrations import set_module_quantized_tensor_to_device,
set_module_quantized_tensor_to_device(model_to_load, key, "cpu", torch.empty(*param.size(), dtype=dtype))
works well as the previous transformers version did.
OR, just changing the accelerate config with fsdp_cpu_ram_efficient_loading: false
also works in my environment.
Hope this helps :)
@younesbelkada
Currently, hf_quantizer.create_quantized_param()
in below lines should add argument unexpected_keys
. https://github.com/huggingface/transformers/blob/831bc25d8fdb85768402f772cf65cc3d7872b211/src/transformers/modeling_utils.py#L3930 https://github.com/huggingface/transformers/blob/831bc25d8fdb85768402f772cf65cc3d7872b211/src/transformers/modeling_utils.py#L3743
Also it seems some discrepancies on bnb quantizer between transformers.integrations.bitsandbytes.py
and transformers.quantizers.quantizer_bnb_{4,8}bit.py
.
Would you mind check the PR later after I spend some time to work with?
Thanks :)
Hi @seungwoos
I made #29420 - let me know if this fixes the issue ! I think we should make the fix as simple as possible by just making unexpected_keys
optional
@younesbelkada
Thanks for your quick response.
Actually I found that by setting fsdp_cpu_ram_efficient_loading: false
on accelerate config makes everything clear; there is no need to pass unexpected_keys
.
I guess it might depends on the system where mine ran out of RAM memory, but your PR might prevent unexpected problems from other users.
Thanks !
OK makes sense, thanks !
System Info
transformers
version: 4.38.0.dev0Who can help?
No response
Information
Tasks
examples
folder (such as GLUE/SQuAD, ...)Reproduction
I'm using
Mistral-7b-v0.1
model to finetune my dataset. When I tried to work with QLoRA, I faced the problems on quantization:When it comes to using Bitsandbytes quantization from latest release, it raises an type error. I suspect the following lines should take
unexpected_keys
argument. https://github.com/huggingface/transformers/blob/2a9b1f80c45cab19b542bc7cc004937d39d6f6fb/src/transformers/modeling_utils.py#L3737 https://github.com/huggingface/transformers/blob/1a77f07f6556b1482bd5e5f8399aa528727d1b47/src/transformers/modeling_utils.py#L3924After I manually change the function to take the arg, I observed all the weights are passed into quantized including embedding layer. These lines seems not check whether the weights are quantizable or not https://github.com/huggingface/transformers/blob/2a9b1f80c45cab19b542bc7cc004937d39d6f6fb/src/transformers/modeling_utils.py#L3916-L3924 as in
transformers.integrations.bitsandbytes.py
. https://github.com/huggingface/transformers/blob/2a9b1f80c45cab19b542bc7cc004937d39d6f6fb/src/transformers/integrations/bitsandbytes.py#L66-L67 Should it be necessary for making another condition using the following function? https://github.com/huggingface/transformers/blob/2a9b1f80c45cab19b542bc7cc004937d39d6f6fb/src/transformers/quantizers/quantizer_bnb_4bit.py#L118-L132Expected behavior
All the target (linear) layers should be quantized.