用int4训练后，可以加载，但导出时，如果选择量化登记4，则失败

kynow2 commented 7 months ago

Reminder

[X] I have read the README and searched the existing issues.

Reproduction

用int4训练后，可以加载，但导出时，如果选择量化等级4，则失败。错误提示为：ValueError: Please merge adapters before quantizing the model.

如果不选择等级4，用默认的none是可以导出的。想问下，哪种模式才是正确的int4导出方式，另外，训练时，选择量化等级4，就是已经量化训练了么？还需要配合别的操作么？

hiyouga commented 6 months ago

先选择无量化等级导出，再选择导出后的模型开启量化等级再导出

waveboy800 commented 4 months ago

但是这样二次量化导出的时候会提示out of memory，cuda error

sevendark commented 4 months ago

无量化导出成功了，之后进行量化导出可以走完全程，但是最后无法导出, 直接输出一句Killed然后程序结束，日志如下。

版本：Welcome to LLaMA Factory, version 0.7.1.dev0 cpu导出和gpu导出都试了 GPU: RTX3070 8GB

WARNING:root:Some parameters are on the meta device device because they were offloaded to the cpu.
Quantizing model.layers blocks : 100%|███████████████████████████████████████████████| 32/32 [3:09:48<00:00, 355.89s/it]
WARNING:optimum.gptq.quantizer:Found modules on cpu/disk. Using Exllama/Exllamav2 backend requires all the modules to be on GPU. Setting `disable_exllama=True`
/home/timothy/miniconda3/envs/llamafactory/lib/python3.10/site-packages/transformers/modeling_utils.py:4371: FutureWarning: `_is_quantized_training_enabled` is going to be deprecated in transformers 4.39.0. Please use `model.hf_quantizer.is_trainable` instead
  warnings.warn(
[WARNING|logging.py:329] 2024-05-14 17:10:55,143 >> The cos_cached attribute will be removed in 4.39. Bear in mind that its contents changed in v4.38. Use the forward method of RoPE from now on instead. It is not used in the `LlamaAttention` class
[WARNING|logging.py:329] 2024-05-14 17:10:55,147 >> The sin_cached attribute will be removed in 4.39. Bear in mind that its contents changed in v4.38. Use the forward method of RoPE from now on instead. It is not used in the `LlamaAttention` class
Killed

hiyouga / LLaMA-Factory

用int4训练后，可以加载，但导出时，如果选择量化登记4，则失败 #2567

Reminder

Reproduction