irthomasthomas / undecidability

6 stars 2 forks source link

Prompt engineering - OpenAI API #663

Open irthomasthomas opened 6 months ago

irthomasthomas commented 6 months ago

Prompt engineering - OpenAI API

Description: Strategy: Test changes systematically

Sometimes it can be hard to tell whether a change — e.g., a new instruction or a new design — makes your system better or worse. Looking at a few examples may hint at which is better, but with small sample sizes it can be hard to distinguish between a true improvement or random luck. Maybe the change helps performance on some inputs, but hurts performance on others.

Evaluation procedures (or "evals") are useful for optimizing system designs. Good evals are:

DIFFERENCE TO DETECT SAMPLE SIZE NEEDED FOR 95% CONFIDENCE
30% ~10
10% ~100
3% ~1,000
1% ~10,000

Evaluation of outputs can be done by computers, humans, or a mix. Computers can automate evals with objective criteria (e.g., questions with single correct answers) as well as some subjective or fuzzy criteria, in which model outputs are evaluated by other model queries. OpenAI Evals is an open-source software framework that provides tools for creating automated evals.

Model-based evals can be useful when there exists a range of possible outputs that would be considered equally high in quality (e.g. for questions with long answers). The boundary between what can be realistically evaluated with a model-based eval and what requires a human to evaluate is fuzzy and is constantly shifting as models become more capable. We encourage experimentation to figure out how well model-based evals can work for your use case.

URL: OpenAI Prompt Engineering Guide

Suggested labels

{'label-name': 'Systematic Testing', 'label-description': 'Strategies for testing changes systematically to optimize system designs.', 'gh-repo': 'OpenAI-API', 'confidence': 63.29}

irthomasthomas commented 6 months ago

Related issues

659: Prompt engineering: Split complex tasks into simpler subtasks - openai

### DetailsSimilarity score: 0.89 - [ ] [Prompt engineering](https://platform.openai.com/docs/guides/prompt-engineering/strategy-split-complex-tasks-into-simpler-subtasks) # Prompt engineering **Description:** Strategy: Split complex tasks into simpler subtasks Tactic: Use intent classification to identify the most relevant instructions for a user query For tasks in which lots of independent sets of instructions are needed to handle different cases, it can be beneficial to first classify the type of query and to use that classification to determine which instructions are needed. This can be achieved by defining fixed categories and hardcoding instructions that are relevant for handling tasks in a given category. This process can also be applied recursively to decompose a task into a sequence of stages. The advantage of this approach is that each query will contain only those instructions that are required to perform the next stage of a task which can result in lower error rates compared to using a single query to perform the whole task. This can also result in lower costs since larger prompts cost more to run (see pricing information). Suppose for example that for a customer service application, queries could be usefully classified as follows: **SYSTEM** You will be provided with customer service queries. Classify each query into a primary category and a secondary category. Provide your output in json format with the keys: primary and secondary. Primary categories: Billing, Technical Support, Account Management, or General Inquiry. Billing secondary categories: - Unsubscribe or upgrade - Add a payment method - Explanation for charge - Dispute a charge Technical Support secondary categories: - Troubleshooting - Device compatibility - Software updates Account Management secondary categories: - Password reset - Update personal information - Close account - Account security General Inquiry secondary categories: - Product information - Pricing - Feedback - Speak to a human **USER** I need to get my internet working again. [Open in Playground](https://platform.openai.com/docs/guides/prompt-engineering/strategy-split-complex-tasks-into-simpler-subtasks) Based on the classification of the customer query, a set of more specific instructions can be provided to a model for it to handle next steps. For example, suppose the customer requires help with "troubleshooting". **SYSTEM** You will be provided with customer service inquiries that require troubleshooting in a technical support context. Help the user by: - Ask them to check that all cables to/from the router are connected. Note that it is common for cables to come loose over time. - If all cables are connected and the issue persists, ask them which router model they are using - Now you will advise them how to restart their device: - If the model number is MTD-327J, advise them to push the red button and hold it for 5 seconds, then wait 5 minutes before testing the connection. - If the model number is MTD-327S, advise them to unplug and replug it, then wait 5 minutes before testing the connection. - If the customer's issue persists after restarting the device and waiting 5 minutes, connect them to IT support by outputting {"IT support requested"}. - If the user starts asking questions that are unrelated to this topic then confirm if they would like to end the current chat about troubleshooting and classify their request according to the following scheme: **USER** I need to get my internet working again. [Open in Playground](https://platform.openai.com/docs/guides/prompt-engineering/strategy-split-complex-tasks-into-simpler-subtasks) #### Suggested labels #### {'label-name': 'task-decomposition', 'label-description': 'Strategy of breaking down complex tasks into simpler subtasks for efficient handling', 'confidence': 63.13}

314: Prompt Engineering Guide | Prompt Engineering Guide

### DetailsSimilarity score: 0.87 - [ ] [Prompt Engineering Guide | Prompt Engineering Guide](https://www.promptingguide.ai/) Prompt Engineering Guide Prompt engineering is a relatively new discipline for developing and optimizing prompts to efficiently use language models (LMs) for a wide variety of applications and research topics. Prompt engineering skills help to better understand the capabilities and limitations of large language models (LLMs). Researchers use prompt engineering to improve the capacity of LLMs on a wide range of common and complex tasks such as question answering and arithmetic reasoning. Developers use prompt engineering to design robust and effective prompting techniques that interface with LLMs and other tools. Prompt engineering is not just about designing and developing prompts. It encompasses a wide range of skills and techniques that are useful for interacting and developing with LLMs. It's an important skill to interface, build with, and understand capabilities of LLMs. You can use prompt engineering to improve safety of LLMs and build new capabilities like augmenting LLMs with domain knowledge and external tools. Motivated by the high interest in developing with LLMs, we have created this new prompt engineering guide that contains all the latest papers, advanced prompting techniques, learning guides, model-specific prompting guides, lectures, references, new LLM capabilities, and tools related to prompt engineering.

369: "You are a helpful AI assistant" : r/LocalLLaMA

### DetailsSimilarity score: 0.86 - [ ] ["You are a helpful AI assistant" : r/LocalLLaMA](https://www.reddit.com/r/LocalLLaMA/comments/18j59g1/you_are_a_helpful_ai_assistant/?share_id=g_M0-7C_zvS88BCd6M_sI&utm_content=1&utm_medium=android_app&utm_name=androidcss&utm_source=share&utm_term=1) "You are a helpful AI assistant" Discussion I've been stumbling around this sub for awhile, testing all the small models and preaching the good word of the omnipotent OpenHermes. Here's some system prompt tips I've picked up: Don't say "don't": this confuses them, which makes sense when you understand how they "think". They do their best to string concepts together, but they simply generate the next word in the sequence from the context available. Saying "don't" will put everything following that word into the equation for the following words. This can cause it to use the words and concepts you're telling it not to. Alternative: try to use "Only" statements. Instead of "Don't talk about any other baseball team besides the New York Yankees" say "Only talk about the New York Yankees". CAPITALIZING INSTRUCTIONS: For some reason, this works when used sparingly, it even makes some models pay attention to "don't". Surprisingly, this seems to work with even ChatGPT. It can quickly devolve your system prompt into confused yelling if you don't limit it, and can even cause your model to match the format and respond with confused yelling, so really only once or twice on important concepts. \n: A well formated system prompt goes a long way. Splitting up different sections with a line break makes a noticeable improvement in comprehension of the system prompt by the model. For example, here is my format for LMStudio: " Here is some information about the user: (My bio) (system prompts) Here is some context for the conversation: (Paste in relevant info such as web pages, documentation, etc, as well as bits of the convo you want to keep in context. When you hit the context limit, you can restart the chat and continue with the same context). "You are a helpful AI assistant" : this is the demo system prompt to just get agreeable answers from any model. The issue with this is, once again, how they "think". The models can't conceptualize what is helpful beyond agreeing with and encouraging you. This kind of statement can lead to them making up data and concepts in order to agree with you. This is extra fun because you may not realize the problem until you discover for yourself the falacy of your own logic. Think it through/Go over your work: This works, but I think it works because it directs attention to the prompt and response. Personally, I think there's better ways to do this. Role assignment: telling it to act as this character or in that role is obviously necessary in some or even most instances, but this can also be limiting. It will act as that character, with all the limits and falacies of that character. If your waifu can't code, neither will your AI. Telling it to be confident: This is a great way to circumvent the above problem, but also runs the risk of confident hallucinations. Here's a 2 prompt trick I use: Tell one assistant to not answer the user prompt, but to simply generate a list of facts, libraries, or research points from its own data that can be helpful to answering the prompt. The prompt will be answered by the same model LLM, so write the list with the same model LLM as the future intended audience instead of a human. Then pass the list to your assistant you intend to chat with with something like "you can confidently answer in these subjects that you are an expert in: (the list). The point of this ^ is to limit its responses to what it actually knows, but make it confidentially answer with the information it's sure about. This has been incredibly useful in my cases, but absolutely check their work. #### Suggested labels #### { "key": "sparse-computation", "value": "Optimizing large language models using sparse computation techniques" }

399: openai-python api doc

### DetailsSimilarity score: 0.86 - [ ] [openai-python/api.md at main · openai/openai-python](https://github.com/openai/openai-python/blob/main/api.md) ### Add error handling for failed API requests **Is this a bug or feature request?** Bug **What is the current behavior?** Currently, the application does not handle failed API requests, resulting in a poor user experience and potential loss of data. **What is the expected behavior?** The application should handle failed API requests gracefully, providing clear error messages to the user and, if possible, retrying the request. **What is the impact of this issue?** The lack of error handling can lead to confusion for users when API requests fail, and may cause them to lose data if they are not aware that the request has failed. Additionally, it can make it difficult for developers to diagnose and fix issues with the application. **Possible Solutions:** 1. Implement a global error handler that catches failed API requests and displays an appropriate error message to the user. 2. If possible, implement a retry mechanism for failed API requests, to increase the chances of success on the second attempt. 3. Log failed API requests for further analysis and debugging. **Steps to reproduce:** 1. Open the application. 2. Trigger an API request (e.g. by submitting a form, or refreshing the page). 3. Disconnect from the internet or otherwise prevent the API request from succeeding. 4. Observe the lack of error handling and the poor user experience. **Additional context:** This issue has been identified as a priority for improving the reliability and user experience of the application. It is also an important step in ensuring that the application can be easily maintained and debugged by developers. #### Suggested labels #### { "key": "ai-platform", "value": "Platforms and tools for implementing AI solutions" }

178: Changing the order of sentences in a prompt changes the quality of the output.

### DetailsSimilarity score: 0.86 # Prompt Ordering Experiment: Impact on Linux Terminal Command Outputs ## Overview This experiment investigates how the ordering of sentences in a prompt affects the output quality when interacting with a language model designed to generate Linux terminal commands. The model is instructed to respond with valid commands for a Manjaro (Arch) Linux system, considering the latest information up to the knowledge cutoff in 2023. ## Methodology The same text was provided to the language model in different orders to observe the variation in the generated outputs. The primary task was to write a bash terminal command to check the local IP address. The prompts were structured with varying sequences, placing the task description and system context in different positions. ## Results The following prompt-response pairs were generated during the experiment: ### Prompt 1 ``` "You are a linux terminal assistant. Always respond in valid Linux terminal commands. The system is Manjaro (Arch) Linux, Kernel 6.0, KDE Plasma desktop 5.2, Python 3.11 Think it through carefully. Be concise, but include much info. Reply with the full terminal cmd string. In problem solving, always use latest info up to your knowledge cutoff. Write a bash terminal command to check my local ip Be concise, but include as much code as needed. Do not write anything else. Ever." ``` #### Response 1 ```bash ip addr show | grep inet | awk '{print $2}' | grep -v '127.0.0.1' ``` ### Prompt 2 ``` "Write a bash terminal command to check my local ip Be concise, but include as much code as needed. Do not write anything else. Ever." ``` #### Response 2 ```bash ipconfig | grep "IPv4" | awk '{print $2}' ``` ### Prompt 3 ``` "Write a bash terminal command to check my local ip Be concise, but include as much code as needed. Do not write anything else. Ever. You are a linux terminal assistant. Always respond in valid Linux terminal commands. The system is Manjaro (Arch) Linux, Kernel 6.0, KDE Plasma desktop 5.2, Python 3.11 Think it through carefully. Be concise, but include much info. Reply with the full terminal cmd string. In problem solving, always use latest info up to your knowledge cutoff. " ``` #### Response 3 ```bash ip addr show | grep "inet " | grep -v 127.0.0.1 | awk '{print $2}' | cut -d '/' -f1 ``` ## Analysis The experiment demonstrates that the ordering of sentences within the prompt can lead to different outcomes. Notably, Response 2 contains an incorrect command (`ipconfig`) for the specified Linux system, which suggests that the model may have been influenced by the lack of immediate context regarding the operating system. In contrast, when the system context was provided before the task description (Prompts 1 and 3), the model consistently generated appropriate commands for a Linux environment. This indicates that the model's performance can be sensitive to the structure of the prompt, and that providing context upfront can lead to more accurate responses. ## Conclusion The ordering of information in a prompt can significantly affect the quality of the output from a language model. For tasks requiring specific contextual knowledge, such as generating Linux terminal commands, it is beneficial to provide the relevant context before the task description to guide the model towards the correct domain and improve the accuracy of its responses. ## Recommendations - When interacting with language models for technical tasks, structure prompts with context first to ensure domain-appropriate responses. - Further experiments could explore the impact of prompt ordering on different types of tasks and domains to generalize these findings. ---

630: OpenRouter: Prompt Transforms

### DetailsSimilarity score: 0.86 - [ ] [Docs | OpenRouter](https://openrouter.ai/docs#transforms) # Docs | OpenRouter **Description:** Prompt Transforms OpenRouter has a simple rule for choosing between sending a prompt and sending a list of ChatML messages: Choose messages if you want to have OpenRouter apply a recommended instruct template to your prompt, depending on which model serves your request. Available instruct modes include: - alpaca: docs - llama2: docs - airoboros: docs Choose prompt if you want to send a custom prompt to the model. This is useful if you want to use a custom instruct template or maintain full control over the prompt submitted to the model. To help with prompts that exceed the maximum context size of a model, OpenRouter supports a custom parameter called transforms: ```typescript { transforms: ["middle-out"], // Compress prompts > context size. This is the default for all models. messages: [...], // "prompt" works as well model // Works with any model } ``` The transforms param is an array of strings that tell OpenRouter to apply a series of transformations to the prompt before sending it to the model. Transformations are applied in-order. Available transforms are: - middle-out: compress prompts and message chains to the context size. This helps users extend conversations in part because LLMs pay significantly less attention to the middle of sequences anyway. Works by compressing or removing messages in the middle of the prompt. **Note:** All OpenRouter models default to using middle-out, unless you exclude this transform by e.g. setting transforms: [] in the request body. [More information](https://openrouter.ai/docs#transforms) #### Suggested labels #### {'label-name': 'prompt-transformations', 'label-description': 'Descriptions of transformations applied to prompts in OpenRouter for AI models', 'gh-repo': 'openrouter/ai-docs', 'confidence': 52.95}