Open MadhumithaSrini opened 4 weeks ago
Hi @MadhumithaSrini, thanks for reporting it we will check at our end and return to you.
Thank you
Hi @MadhumithaSrini, could you please share the commands you used for step 1 and step 2?
Hi sure, the command is:
OMP_NUM_THREADS=
Describe the issue
Doubt: I generated the q_config_summary_file without enabling bf16, but while inferring I am enabling it with "--quant-with-amp" flag. I tested couple of other models like gpt-j-6b, chatglm3, llama-2-7b-chat-hf and all of this pass while qwen fails at inference. So, will using the flag "quant-with-amp" in inference and not while generating the config file matter?