Open czq99972 opened 1 month ago
Hey @czq99972, @SunMarc, @MekkCyber! I can take it as soon as I finish current implementation for Mamba arch, but it wouldn't be so long. I think I will be able to start working on deepseek2 on this week. Link to main issue thread: https://github.com/huggingface/transformers/issues/33260
Hey ! Deepspeedv2 gguf can be supported with gguf files once it is integrated in transformers: https://github.com/huggingface/transformers/pull/31976 !
Any update on deepseek v2 support? @VladOS95-cyber
hey @wavy-jung, I see that Deepspeedv2 architecture is not supported yet, this PR #31976 is still in progress
System Info
The current Transformers framework doesn't support the gguf quantized model files from deepseek2. Can you please advise when this support might be added? @SunMarc @MekkCyber
Who can help?
@SunMarc @MekkCyber
Information
Tasks
examples
folder (such as GLUE/SQuAD, ...)Reproduction
File "/home/work/miniforge3/envs/vllm/lib/python3.11/site-packages/transformers/models/auto/configuration_auto.py", line 1006, in from_pretrained config_dict, unused_kwargs = PretrainedConfig.get_config_dict(pretrained_model_name_or_path, kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/work/miniforge3/envs/vllm/lib/python3.11/site-packages/transformers/configuration_utils.py", line 570, in get_config_dict config_dict, kwargs = cls._get_config_dict(pretrained_model_name_or_path, kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/work/miniforge3/envs/vllm/lib/python3.11/site-packages/transformers/configuration_utils.py", line 661, in _get_config_dict config_dict = load_gguf_checkpoint(resolved_config_file, return_tensors=False)["config"] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/work/miniforge3/envs/vllm/lib/python3.11/site-packages/transformers/modeling_gguf_pytorch_utils.py", line 103, in load_gguf_checkpoint raise ValueError(f"Architecture {architecture} not supported") ValueError: Architecture deepseek2 not supported
Expected behavior
1