Closed bozheng-hit closed 5 months ago
cc @younesbelkada @SunMarc
Hi @bozheng-hit, thanks for reporting ! I can indeed reproduce the error and it also happens to the mixtral models. I'm not sure what would be the best fix for now since adding back the top_x.shape[0] == 0:
condition will break the fx tracing for qwen moe and mixtral and inference works fine on the original model. WDYT @amyeroberts @ArthurZucker ? LMK if you come up with a solution @bozheng-hit, I will try to find a solution too.
@SunMarc Would having a conditional check e.g. if not is_tracing() and top_x.shape[0] == 0:
work as a partial fix?
Thanks for the tip @amyeroberts but it doesn't work ;) . However, I tested the exllamav2 kernel and it works with it. The exllamav1 kernel must have some issues. A potential fix would be to change the quantization_config inside the config.json, so that users uses exllamav2 kernel by default. WDYT @bozheng-hit ? You would have to set version to 2 in exllama_config field here.
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.
Please note that issues that do not follow the contributing guidelines are likely to be ignored.
Closing this since I think that the issue is fixed !
System Info
Generating with GPTQ models encounters the following errors after merging this PR: https://github.com/huggingface/transformers/pull/30209 @younesbelkada @SunMarc
The error information is here, and the model successfully generates after I revert the change for modeling_qwen2_moe.py.
Who can help?
No response
Information
Tasks
examples
folder (such as GLUE/SQuAD, ...)Reproduction
The code to reproduce to error is here:
The error information is here:
Expected behavior
Output the following text:
A large language model is a type of artificial intelligence that is trained to understand and generate human language. These models are designed to process and comprehend natural language input, and can be used for a variety of tasks such as language translation, sentiment analysis, and chatbot development. They are typically very large neural networks that have been pre-trained on vast amounts of text data, allowing them to learn the nuances of language and make intelligent predictions about how to respond to different inputs. Large language models have become increasingly popular in recent years due to their ability to handle complex language tasks and their potential applications in fields such as customer service, content creation, and education.