Open liuhu opened 1 year ago
Hi @liuhu , Thank you for your interest in our work and for your kind words!
We haven't conducted much experiments with other open source models. I agree that new open source models come out every day, claiming to surpass ChatGPT, but they are eventually found not to be as general and as adaptive as ChatGPT.
I don't have a clear solution to that other than trying a few others (maybe Falcon?).
Regarding fine-tuning: yes, I think that if you have the resources and the data, fine-tuning on your examples can help reducing the prompt size to as small as the example-specific inputs.
Maybe @luyug @madaan @shuyanzhou have some thoughts.
Best, Uri
We are also using programming to solve user problems through PAL, and we are facing issues with long prompts and inference. We would like to know if fine-tuning is effective in addressing these issues or if there are other solutions that we can consider.
@HXCrazy thank you for reaching out!
Please describe the problems you are facing in a new issue so we could provide a better response.
Best, Uri
Background
We have a chatbot that uses the PAL method to program some custom functions to answer user questions in combination with user data in the system. User data is sleep and exercise data uploaded by users through smart wearable devices. The data types are very rich (including 50+ data fields), and the amount of data is large (every user will generate multiple pieces of data every day). The Python code generated by LLM, that determines the time range of the query data, the data fields that need to be queried, the function arrangement and other information
Prompt template:
Thank you very much for your patience in reading this far. I wrote a lot of background information in order to describe the problem, which resulted in a very long text.
Question
PAL is an amazing method. We have already used it in production. We want to replace openai LLM with the open source LLM and encounter some problems:
Found by analysis: a. Compared with gpt-3.5-turbo, The PaLM2, WizardCoder, and Vicuna all have a decline in date reasoning performance. Is there any way to improve date reasoning? b. The generalization ability of WizardCoder-15B and Vicuna-13B is insufficient. There are many output codes, basically copying few-shot, and not generating code according to the problem. Is it caused by insufficient model parameters?
Thanks again everyone, if you can pick some questions and help me answer them