Open ushakov opened 3 months ago
Hi @ushakov, thank you for your support with LLMLingua.
The gap is primarily due to the different modes of the OpenAI model. Currently, there are two ways to replicate the respective results:
Regarding the performance loss issue in chat mode, we are currently designing methods to make improvements.
Thanks for the pointers! While I'm waiting for MS to approve my access to gpt in Azure API, I've run a test with gpt-3.5-turbo-instruct, and it unfortunately works even worse than gpt-3.5-turbo-0301 in chat mode that I used previously: full prompt gives 76.9%, compressed prompt gives 59.6%.
Hi @ushakov, thank you for your support with LLMLingua.您好,感谢您对 LLMLingua 的支持。
The gap is primarily due to the different modes of the OpenAI model. Currently, there are two ways to replicate the respective results:差距主要是由于OpenAI模型的模式不同造成的。目前,有两种方法可以复制各自的结果:
- Use Azure OpenAI, which still supports the gpt-3.5-turbo-0301 completion mode.使用 Azure OpenAI,它仍然支持 gpt-3.5-turbo-0301 完成模式。
- Use "gpt-3.5-turbo-instruction", which supports the completion mode and can be compared with the results of the original prompt.使用“gpt-3.5-turbo-instruction”,支持补全模式,可以与原来提示的结果进行比较。
Regarding the performance loss issue in chat mode, we are currently designing methods to make improvements.针对聊天模式下的性能损失问题,我们目前正在设计方法进行改进。
Thank you for your answer, it is very helpful to me. When I tried to use Azure's gpt-3.5-turbo-0301, an error message appeared, which seemed to mean that the model no longer existed.
If possible, could you please tell me about the latest model that can be used in Azure to reproduce this experiment? I would be very grateful for this.
Hi @ushakov, thank you for your support with LLMLingua.您好,感谢您对 LLMLingua 的支持。 The gap is primarily due to the different modes of the OpenAI model. Currently, there are two ways to replicate the respective results:差距主要是由于OpenAI模型的模式不同造成的。目前,有两种方法可以复制各自的结果:
- Use Azure OpenAI, which still supports the gpt-3.5-turbo-0301 completion mode.使用 Azure OpenAI,它仍然支持 gpt-3.5-turbo-0301 完成模式。
- Use "gpt-3.5-turbo-instruction", which supports the completion mode and can be compared with the results of the original prompt.使用“gpt-3.5-turbo-instruction”,支持补全模式,可以与原来提示的结果进行比较。
Regarding the performance loss issue in chat mode, we are currently designing methods to make improvements.针对聊天模式下的性能损失问题,我们目前正在设计方法进行改进。
Thank you for your answer, it is very helpful to me. When I tried to use Azure's gpt-3.5-turbo-0301, an error message appeared, which seemed to mean that the model no longer existed.
If possible, could you please tell me about the latest model that can be used in Azure to reproduce this experiment? I would be very grateful for this.
Hi @Yeqishen, you can try to use "gpt-3.5-turbo-instruction".
Describe the issue
I attempted to reproduce the results of the LLMLingua paper using the CoT.ipynb notebook from the examples folder. However, I encountered a discrepancy in the accuracy achieved. The result in CoT.ipynb reports 78% accuracy, but I only achieved 68% accuracy in my reproduction attempt.
Changes made to CoT.ipynb:
The rest of the file was kept unchanged as per the GitHub version.
Expected Behavior:
The reproduction should yield results consistent with the reported 78% accuracy, as in the output of the last cell in the notebook:
Actual Behavior:
I obtained only 68% accuracy:
Question
Is this expected? Any ideas what could be the problem here? If the culprit is the openai model used, any ideas how to fix this -- gpt-3.5 model family no longer allow non-chat inference...
Thanks in advance!