Python: System Prompt is negatively affecting the quality of AzureChatCompletion responses

andrewldesousa commented 8 months ago

Describe the bug I am trying to use different system prompts for AzureChatCompletion.

I have tried passing different system prompts to AzureChatRequestSettings and AzureAISearchDataSources's role information parameter. Only the prompt "You are an AI assistant that helps people find information." seems to work well, and if I change the prompt, the response is being cut short or isn't correct in some way or doesn't have the intended effect of a system prompt.

For example, "What are some countries in Europe?" will give the answer "Some countries in Europe include ." if I give the system prompt to be a funny assistant that responds with jokes. By passing the prompt as roleInformation in the datasources dictionary without using semantic kernel (and directly calling AzureOpenAIService via self-made python functions), it seems to handle the prompt well and does respond with jokes.

After testing, I do not think this is an issue regarding prompt engineering but it seems that potentially the role_information parameter in AzureChatRequestSettings is not working correctly. I have also tried with appending the system message at the start of chat messages as well but this did not help.

To Reproduce

async def sk_chat_completion_async(body, headers, endpoint, history_metadata):
    deployment_name = AZURE_OPENAI_MODEL_NAME

    az_source = AzureAISearchDataSources(
        indexName=AZURE_SEARCH_INDEX, 
        endpoint=f"https://{AZURE_SEARCH_SERVICE}.search.windows.net", 
        key=AZURE_SEARCH_KEY,
        role_information=AZURE_OPENAI_SYSTEM_MESSAGE,
        strictness=AZURE_SEARCH_STRICTNESS,
        semanticConfiguration=AZURE_SEARCH_SEMANTIC_SEARCH_CONFIG if AZURE_SEARCH_SEMANTIC_SEARCH_CONFIG else "",
        queryType=AZURE_SEARCH_QUERY_TYPE,
        topNDocuments=AZURE_SEARCH_TOP_K,
        inScope=True if AZURE_SEARCH_ENABLE_IN_DOMAIN.lower() == "true" else False,
        fieldsMapping={
            "contentFields": parse_multi_columns(AZURE_SEARCH_CONTENT_COLUMNS) if AZURE_SEARCH_CONTENT_COLUMNS else [],
            "titleField": AZURE_SEARCH_TITLE_COLUMN if AZURE_SEARCH_TITLE_COLUMN else None,
            "urlField": AZURE_SEARCH_URL_COLUMN if AZURE_SEARCH_URL_COLUMN else None,
            "filepathField": AZURE_SEARCH_FILENAME_COLUMN if AZURE_SEARCH_FILENAME_COLUMN else None,
            "vectorFields": parse_multi_columns(AZURE_SEARCH_VECTOR_COLUMNS) if AZURE_SEARCH_VECTOR_COLUMNS else []
        },
    )

    az_data = AzureDataSources(type="AzureCognitiveSearch", parameters=az_source)
    extra = ExtraBody(dataSources=[az_data])
    settings = AzureChatRequestSettings(extra_body=extra)
    settings.temperature = AZURE_OPENAI_TEMPERATURE
    settings.max_tokens = AZURE_OPENAI_MAX_TOKENS
    settings.top_p = AZURE_OPENAI_TOP_P

    chat_completion = sk_aois.AzureChatCompletion(
        deployment_name=deployment_name,
        endpoint=AZURE_OPENAI_ENDPOINT,
        api_key=AZURE_OPENAI_KEY,
        api_version=AZURE_OPENAI_PREVIEW_API_VERSION,
        use_extensions=True,
    )

    chat_messages = list(
        {"role": msg["role"], "content": msg["content"]} for msg in body["messages"]
    )

    async for message in chat_completion.complete_chat_stream_async(chat_messages, settings):
       # message processing goes here

Screenshots Using the system prompt "You are an AI assistant that helps people find information."

Using the system prompt "You are an AI assistant that responds with humor and uses jokes in your answers."

Platform

OS: Windows
IDE: VS Code
Language: Python
Source: Installed most recent version via PIP

moonbox3 commented 8 months ago

Hi @andrewldesousa, thank you for the detailed issue. Upon first glance, it looks like the role_information is using snake case instead of camel case (roleInformation), as is defined in the AzureDataSourceParameters class. Can you try using camel case and see if it helps your issue? Otherwise, please let us know.

andrewldesousa commented 8 months ago

@moonbox3 thanks for pointing that out. going forward i will use camel case.

It seems I am getting similar behavior with camel case. Using the respond with humor system prompt, I get the answer "Europe is a continent that consists of many countries. Some of the countries in Europe include ." If I use the system prompt "You are an AI assistant that helps people find information." with camel case, I get a similar answer as to what I have screenshotted above.

andrewldesousa commented 8 months ago

@moonbox3 any update on this?

abhahn commented 8 months ago

Hi @andrewldesousa , I currently work on the On Your Data team and have been tasked with trying to find a repro for your issue. :) I have been using a slightly edited version of the code snippet you shared above, and was not able to repro the issue by changing the system prompt in the way that you described. I am observing that the response using the "humor and jokes" prompt doesn't appear to be actually using humor, but for me the response is, at least, given in complete sentences.

I wanted to collect a little bit more info about your case above to see if there are any other configs / code impacting the output:

Can you share your other Azure OpenAI and Azure Search parameter settings? You can omit endpoints and keys, but the other params would be useful to include just in case they are impacting outputs from the model. It will also be helpful to include which model you are using as well.
Can you share a little more detail about the results processing part near where you have the # message processing starts here comment? The reason I ask this is to see if there is possibly anything happening while parsing the response to cause it to become truncated in the output.

andrewldesousa commented 8 months ago

Here is the processing code:

async for message in chat_completion.complete_chat_stream_async(chat_messages, settings):
        tool_message = await message.get_tool_message()
        response = {
                "id": "",
                "model": "",
                "created": 0,
                "object": "",
                "choices": [{
                    "messages": []
                }],
                "apim-request-id": "",
                'history_metadata': history_metadata
            }

        response["id"] = str(uuid.uuid4())
        response["model"] = AZURE_OPENAI_MODEL_NAME
        response["created"] = int(time.time())
        response["object"] = "extensions.chat.completion.chunk"
        response["apim-request-id"] = headers.get("apim-request-id")
        response["choices"][0]["messages"].append({
            "role": "tool",
            "content": tool_message
        })

        yield format_as_ndjson(response)

        async for deltaText in message:
            response = {
                "id": "",
                "model": "",
                "created": 0,
                "object": "",
                "choices": [{
                    "messages": []
                }],
                "apim-request-id": "",
                'history_metadata': history_metadata
            }

            response["id"] = str(uuid.uuid4())
            response["model"] = AZURE_OPENAI_MODEL_NAME
            response["created"] = int(time.time())
            response["object"] = "extensions.chat.completion.chunk"
            response["apim-request-id"] = headers.get("apim-request-id")
            response["choices"][0]["messages"].append({
                "role": "assistant",
                "content": deltaText
            })

            yield format_as_ndjson(response)

my .env looks like this, sensitive values are ommitted: AZURE_SEARCH_SERVICE=byc-search AZURE_SEARCH_INDEX=index name AZURE_SEARCH_KEY= AZURE_SEARCH_USE_SEMANTIC_SEARCH=False AZURE_SEARCH_SEMANTIC_SEARCH_CONFIG=default AZURE_SEARCH_INDEX_IS_PRECHUNKED=False AZURE_SEARCH_TOP_K=5 AZURE_SEARCH_ENABLE_IN_DOMAIN=False AZURE_SEARCH_CONTENT_COLUMNS=content AZURE_SEARCH_FILENAME_COLUMN=title AZURE_SEARCH_TITLE_COLUMN=title AZURE_SEARCH_URL_COLUMN=title AZURE_SEARCH_VECTOR_COLUMNS= AZURE_SEARCH_QUERY_TYPE=simple AZURE_SEARCH_PERMITTED_GROUPS_COLUMN= AZURE_SEARCH_STRICTNESS=3 AZURE_OPENAI_RESOURCE= AZURE_OPENAI_MODEL=gpt-35-turbo-16k AZURE_OPENAI_KEY= AZURE_OPENAI_MODEL_NAME=gpt-35-turbo-16k AZURE_OPENAI_TEMPERATURE=0 AZURE_OPENAI_TOP_P=1.0 AZURE_OPENAI_MAX_TOKENS=1000 AZURE_OPENAI_STOP_SEQUENCE= AZURE_OPENAI_SYSTEM_MESSAGE=You are an AI assistant that helps people find information. AZURE_OPENAI_PREVIEW_API_VERSION=2023-12-01-preview AZURE_OPENAI_STREAM=True AZURE_OPENAI_ENDPOINT=https://byc-aoai.openai.azure.com/ AZURE_OPENAI_EMBEDDING_NAME= AZURE_COSMOSDB_ACCOUNT= AZURE_COSMOSDB_DATABASE=db_conversation_history AZURE_COSMOSDB_CONVERSATIONS_CONTAINER=conversations AZURE_COSMOSDB_ACCOUNT_KEY=

andrewldesousa commented 8 months ago

@abhahn thanks for the timely response, please let me know if you have further questions.

abhahn commented 7 months ago

No problem! I made a few changes to my settings using the .env contents above, and am still not able to repro the issue, but I have a few more follow up questions to hopefully continue to narrow things down.

In the processing code that you recently shared, I see there is a function called format_as_ndjson being used but I can't see the definition. I'm not sure if there is more going on here with processing the response which might impact the format. One thing I notice from just playing around with the ndjson package is that printing the contents doesn't appear to be recursive, so complex objects within a dict aren't printing for me in a loop. Again, not sure what is really happening in that function or if you are using ndjson, so maybe you can clarify that here.

One thing I am wondering if you can try is to add the following lines to the top of the data processing part as a way to debug the assistant message directly to see what is coming back. This is how I originally set up my repro script, and how I am able to see the full assistant message. Could you let me know if you are still seeing a truncated response printed here?

async for message in chat_completion.complete_chat_stream_async(chat_messages, settings):
        tokens = [assistant_message async for assistant_message in message]
        print("".join(tokens))

andrewldesousa commented 7 months ago

Thanks

By running

async for message in chat_completion.complete_chat_stream_async(chat_messages, settings):
        tokens = [assistant_message async for assistant_message in message]
        print("".join(tokens))

It prints "Some countries in Europe include ."

After experimenting more and double/triple texting the question I do get a better response but it's non-deterministic given the current settings i provided: "Please note that this is not an exhaustive list, and there are more countries in Europe. Some countries in Europe include:

Albania
Austria
Belgium
Bulgaria
Croatia
Czech Republic
Denmark
Estonia
Finland
France
Germany
Greece
Hungary
Iceland
Ireland
Italy
Latvia
Lithuania
Luxembourg
Malta
Netherlands
Norway
Poland
Portugal
Romania
Slovakia
Slovenia
Spain
Sweden
Switzerland
United Kingdom

Please note that this is not an exhaustive list, and there are more countries in Europe ."

So at the very least it does seem like an issue with not being able to effect change with system prompt. It's hard to tell why the answer is being cut short (maybe bad llm response) but i am still getting that issue with the updated version of my code similar to the snippet you provided above

abhahn commented 7 months ago

Ok, I want to check to see if this is possibly related to another error we have observed recently with streaming requests, where the response payload is not completing properly.

Could you let me know the following additional pieces of info?

1) What region is your AOAI resource in? It may also help if you can share the full resource ID for your AOAI resource so I can see if I can find the requests in our logs. 2) Are you noticing any non-200 responses with debug logging enabled for Semantic Kernel? To see the logs I think you can just import logging and set logging.basicConfig(level=logging.DEBUG) at the top of app.py.

andrewldesousa commented 7 months ago

After trying for about ~5mins, it seems like it's hard to replicate. Maybe a new deployment of azureopenai service? I can try again tomorrow morning to see if I can recreate the issue of cutting off early which I was able to do ~4 hours ago, but I am leaning towards thinking a new version was deployed for my azureopenaiservice.

Region is East US The instance is named byc-aoai, under the BYCRG resource group under Microsoft's "Microsoft Azure Sponsorship 2" subscription.

andrewldesousa commented 7 months ago

@abhahn ok after trying again, i am no longer able to replicate the cutting off issue that I was able to produce yesterday. I suppose something on the backend changed.

If this issue is solved, then the only other outstanding issue would be the system prompt not taking effect.

github-actions[bot] commented 4 months ago

This issue is stale because it has been open for 90 days with no activity.

github-actions[bot] commented 4 months ago

This issue was closed because it has been inactive for 14 days since being marked as stale.

microsoft / semantic-kernel

Python: System Prompt is negatively affecting the quality of AzureChatCompletion responses #4648