microsoft / sample-app-aoai-chatGPT

Sample code for a simple web chat experience through Azure OpenAI, including Azure OpenAI On Your Data.
MIT License
1.66k stars 2.61k forks source link

Private deployment unsuccesful despite no blocked traffic. #442

Closed keisari-ch closed 8 months ago

keisari-ch commented 11 months ago

Hello,

I'm struggling with the solution, i'm able to send max 2 questions, and quickly then, i cant get past the following error :

Error
An error occurred. Answers can't be saved at this time. If the problem persists, please contact the site administrator.

I can observe a lot of python errors on the logs like : image

image

Like i mentioned, this is a private deployment. openai, search, cosmosdb and the app service are deployed in the same VNET app service has private endpoint and vnet integration. NSG FLow Logs show no traffic denied All outbound traffic to internet is routed via Azure Firewall and no traffic is denied as the following fqdn's were allowed prior to deployment of the app service :

        - sts.windows.net
        - dc.services.visualstudio.com
        - github.com
        - oryx-cdn.microsoft.io
        - pypi.org
        - files.pythonhosted.org
        - login.microsoftonline.com

This was tested against these two commits :

image

Any idea how i can debug further ?

And i forgot, when i run a question that stays a while in the generating state, then it shows the previous mentioned error : Error An error occurred. Answers can't be saved at this time. If the problem persists, please contact the site administrator.

From the container i observe that there are connections against both cosmosdb and search api. When this sessions drop, after 1 minute or two, the errror shows up in the UI.

I edited the data_preparation.py file in order to update the api version used (from 2023-07-01-Preview to 2023-10-01-Preview), but after the first 2 questions asked, it sends the usual error messages :

image

023-12-07T18:38:43.032086387Z: [ERROR]  ERROR:root:Exception in /history/update
2023-12-07T18:38:43.032104786Z: [ERROR]  Traceback (most recent call last):
2023-12-07T18:38:43.032108586Z: [ERROR]    File "/tmp/8dbf74ef72b825b/app.py", line 662, in update_conversation
2023-12-07T18:38:43.032111586Z: [ERROR]      if len(messages) > 0 and messages[-1]['role'] == "assistant":
2023-12-07T18:38:43.032114786Z: [ERROR]                               ~~~~~~~~~~~~^^^^^^^^
2023-12-07T18:38:43.032117486Z: [ERROR]  KeyError: 'role'

Any idea how i can troubleshoot this further @sarah-widder ?

Thanks

Originally posted by @keisari-ch in https://github.com/microsoft/sample-app-aoai-chatGPT/issues/209#issuecomment-1845547804

keisari-ch commented 11 months ago

Adding some logs showing up on DEBUG=true When a third question is asked just prior to get an error :

2023-12-08T16:58:09.802065935Z: [ERROR]  DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): ai-outinstance-openai-dev.openai.azure.com:443
2023-12-08T17:00:09.907640528Z: [ERROR]  DEBUG:urllib3.connectionpool:https://ai-outinstance-openai-dev.openai.azure.com:443 "POST /openai/deployments/gpt-35-turbo-16k/extensions/chat/completions?api-version=2023-08-01-preview HTTP/1.1" 504 24
2023-12-08T17:00:09.920444120Z: [INFO]  169.254.129.10 - - [08/Dec/2023:17:00:09 +0000] "POST /history/generate HTTP/1.1" 200 0 "https://outinstance.azurewebsites.net/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/119.0.0.0 Safari/537.36"
2023-12-08T17:00:09.999461648Z: [ERROR]  ERROR:root:Exception in /history/update
2023-12-08T17:00:09.999482648Z: [ERROR]  Traceback (most recent call last):
2023-12-08T17:00:09.999486548Z: [ERROR]    File "/tmp/8dbf75e7007b8cf/app.py", line 662, in update_conversation
2023-12-08T17:00:09.999490048Z: [ERROR]      if len(messages) > 0 and messages[-1]['role'] == "assistant":
2023-12-08T17:00:09.999536548Z: [ERROR]                               ~~~~~~~~~~~~^^^^^^^^
2023-12-08T17:00:09.999540548Z: [ERROR]  KeyError: 'role'
2023-12-08T17:00:10.010202657Z: [INFO]  169.254.129.10 - - [08/Dec/2023:17:00:10 +0000] "POST /history/update HTTP/1.1" 500 19 "https://outinstance.azurewebsites.net/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/119.0.0.0 Safari/537.36"

Trying to check this out : 2023-12-08T17:00:09.907640528Z: [ERROR] DEBUG:urllib3.connectionpool:https://ai-outinstance-openai-dev.openai.azure.com:443 "POST /openai/deployments/gpt-35-turbo-16k/extensions/chat/completions?api-version=2023-08-01-preview HTTP/1.1" 504 24

keisari-ch commented 11 months ago

It works when enabling public access to Search service. I guess openai is some sort of a proxy to the search service, i need now to find a way to restrict access to the search service from the openai scope.

dilipdodiya commented 9 months ago

@sarah-widder - In our case, we are getting "The operation was timeout" error if Cog Search with private endpoint and public access disabled. We have Azure OpenAI, Cog Search & Web App with private endpoints and public network access disabled. All these 3 components are sitting in their own vnets / private endpoint in the respective azure subs. We have also required firewall rules in place for communication between these three components. Do we need to do anything specific to make Azure OpenAI, Cog Search, web app work with private endpoint connection? Does OpenAI API make calls to Azure Cogs Search internally?

HXK8 commented 9 months ago

We ran into the same issues as @dilipdodiya. Details below.

Overview

We are deploying the sample-app-aoai-chatGPT application through Azure AI Studio. Everything works while over public networks, but when we try to create a private network between Azure App Services to Azure AI Search, the application fails with the following:

Invalid AzureCognitiveSearch configuration detected: Call to get ACS index failed. Check you are using correct index, instance and api_key.

Private networking details

Troubleshooting

Diagram

Diagram

Example error

image

App Services logs

2024-02-05T16:18:53.465043284Z: [ERROR]  [2024-02-05 16:18:53 +0000] [79] [ERROR] Error handling request
2024-02-05T16:18:53.465111184Z: [ERROR]  Traceback (most recent call last):
2024-02-05T16:18:53.465117884Z: [ERROR]    File "/tmp/8dc1e90f6f97d9f/app.py", line 398, in stream_with_data
2024-02-05T16:18:53.465122784Z: [ERROR]      response["model"] = lineJson["model"]
2024-02-05T16:18:53.465126884Z: [ERROR]                          ~~~~~~~~^^^^^^^^^
2024-02-05T16:18:53.465130584Z: [ERROR]  KeyError: 'model'
2024-02-05T16:18:53.465134384Z: [ERROR]  
2024-02-05T16:18:53.465152884Z: [ERROR]  During handling of the above exception, another exception occurred:
2024-02-05T16:18:53.465157184Z: [ERROR]  
2024-02-05T16:18:53.465160484Z: [ERROR]  Traceback (most recent call last):
2024-02-05T16:18:53.465164184Z: [ERROR]    File "/opt/python/3.11.4/lib/python3.11/site-packages/gunicorn/workers/sync.py", line 184, in handle_request
2024-02-05T16:18:53.465167984Z: [ERROR]      for item in respiter:
2024-02-05T16:18:53.465171484Z: [ERROR]    File "/tmp/8dc1e90f6f97d9f/antenv/lib/python3.11/site-packages/werkzeug/wsgi.py", line 256, in __next__
2024-02-05T16:18:53.465175184Z: [ERROR]      return self._next()
2024-02-05T16:18:53.465178684Z: [ERROR]             ^^^^^^^^^^^^
2024-02-05T16:18:53.465182184Z: [ERROR]    File "/tmp/8dc1e90f6f97d9f/antenv/lib/python3.11/site-packages/werkzeug/wrappers/response.py", line 32, in _iter_encoded
2024-02-05T16:18:53.465185984Z: [ERROR]      for item in iterable:
2024-02-05T16:18:53.465189484Z: [ERROR]    File "/tmp/8dc1e90f6f97d9f/app.py", line 425, in stream_with_data
2024-02-05T16:18:53.465193184Z: [ERROR]      yield format_as_ndjson({"error" + str(e)})
2024-02-05T16:18:53.465196884Z: [ERROR]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2024-02-05T16:18:53.465200384Z: [ERROR]    File "/tmp/8dc1e90f6f97d9f/app.py", line 170, in format_as_ndjson
2024-02-05T16:18:53.465203984Z: [ERROR]      return json.dumps(obj, ensure_ascii=False) + "\n"
2024-02-05T16:18:53.465207684Z: [ERROR]             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2024-02-05T16:18:53.465211184Z: [ERROR]    File "/opt/python/3.11.4/lib/python3.11/json/__init__.py", line 238, in dumps
2024-02-05T16:18:53.465214984Z: [ERROR]      **kw).encode(obj)
2024-02-05T16:18:53.465218384Z: [ERROR]            ^^^^^^^^^^^
2024-02-05T16:18:53.465221884Z: [ERROR]    File "/opt/python/3.11.4/lib/python3.11/json/encoder.py", line 200, in encode
2024-02-05T16:18:53.465225484Z: [ERROR]      chunks = self.iterencode(o, _one_shot=True)
2024-02-05T16:18:53.465229084Z: [ERROR]               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2024-02-05T16:18:53.465232484Z: [ERROR]    File "/opt/python/3.11.4/lib/python3.11/json/encoder.py", line 258, in iterencode
2024-02-05T16:18:53.465236184Z: [ERROR]      return _iterencode(o, 0)
2024-02-05T16:18:53.465239685Z: [ERROR]             ^^^^^^^^^^^^^^^^^
2024-02-05T16:18:53.465243085Z: [ERROR]    File "/opt/python/3.11.4/lib/python3.11/json/encoder.py", line 180, in default
2024-02-05T16:18:53.465246785Z: [ERROR]      raise TypeError(f'Object of type {o.__class__.__name__} '
2024-02-05T16:18:53.465250285Z: [ERROR]  TypeError: Object of type set is not JSON serializable
2024-02-05T16:18:53.496122015Z: [INFO]  fail: Middleware[0]
2024-02-05T16:18:53.496560416Z: [INFO]        Failed to forward request to http://169.254.129.3:8000. Encountered a System.IO.IOException exception after 32272.399ms with message: The response ended prematurely.. Check application logs to verify the application is properly handling HTTP traffic.
2024-02-05T16:18:53.542252109Z: [INFO]  fail: Microsoft.AspNetCore.Server.Kestrel[13]
2024-02-05T16:18:53.542295609Z: [INFO]        Connection id "0HN1554QM4OJH", Request id "0HN1554QM4OJH:00000002": An unhandled exception was thrown by the application.
2024-02-05T16:18:53.545071321Z: [INFO]        System.InvalidOperationException: StatusCode cannot be set because the response has already started.
2024-02-05T16:18:53.545093721Z: [INFO]           at Microsoft.AspNetCore.Server.Kestrel.Core.Internal.Http.HttpProtocol.ThrowResponseAlreadyStartedException(String value)
2024-02-05T16:18:53.545213221Z: [INFO]           at Microsoft.AspNetCore.Server.Kestrel.Core.Internal.Http.HttpProtocol.Microsoft.AspNetCore.Http.Features.IHttpResponseFeature.set_StatusCode(Int32 value)
2024-02-05T16:18:53.545217821Z: [INFO]           at Microsoft.Azure.AppService.Middleware.Forwarding.RequestForwarder.OnRequest(HttpContext context) in /__w/1/s/src/EasyAuth/Middleware.Forwarding/RequestForwarder.cs:line 90
2024-02-05T16:18:53.545238621Z: [INFO]           at Microsoft.Azure.AppService.Middleware.NetCore.AppServiceMiddleware.InvokeAsync(HttpContext context) in /__w/1/s/src/EasyAuth/Microsoft.Azure.AppService.Middleware.NetCore/AppServiceMiddleware.cs:line 140
2024-02-05T16:18:53.545834924Z: [INFO]           at Microsoft.Azure.AppService.MiddlewareShim.AutoHealing.AutoHealingMiddleware.Invoke(HttpContext context) in /__w/1/s/src/EasyAuth/Middleware.Host/AutoHealing/AutoHealingMiddleware.cs:line 59
2024-02-05T16:18:53.545862024Z: [INFO]           at Microsoft.AspNetCore.Server.Kestrel.Core.Internal.Http.HttpProtocol.ProcessRequests[TContext](IHttpApplication`1 application)

Any help or insight is appreciated.

Thanks!

mjromper commented 9 months ago

Same problem here. We've opened a support ticket with Microsoft.

HXK8 commented 9 months ago

We opened a support ticket as well. Microsoft said we need to apply to enable private endpoint connectivity between Azure AI/OpenAI services to Azure AI Search. The relevant Microsoft Learn article/section is here:

We are waiting for the application to be approved to confirm the resolution step.

That said, we are uncertain how Azure AI/OpenAI communicates directly with Azure AI Search (after the initial *Add your data step is done). We always assumed the application (on App Services) orchestrated the traffic between Azure AI/OpenAI and Azure AI Search, hence the diagram without any connections between the two.

We will provide an update if it works.

Microsoft said wider (open) support is on their roadmap.

dilipdodiya commented 9 months ago

@sarah-widder - In our case, we are getting "The operation was timeout" error if Cog Search with private endpoint and public access disabled. We have Azure OpenAI, Cog Search & Web App with private endpoints and public network access disabled. All these 3 components are sitting in their own vnets / private endpoint in the respective azure subs. We have also required firewall rules in place for communication between these three components. Do we need to do anything specific to make Azure OpenAI, Cog Search, web app work with private endpoint connection? Does OpenAI API make calls to Azure Cogs Search internally?

Quick update - We submitted [Private Endpoint connection request] (https://forms.office.com/pages/responsepage.aspx?id=v4j5cvGGr0GRqy180BHbRw_T3EIZ1KNCuv_1duLJBgpUMUcwV1Y5QjI3UTVTMkhSVUo3R09NNVQxSyQlQCN0PWcu) to Microsoft. Microsoft reviewed / approved request and they created private endpoint connection for AI Search. I just went ahead and just approved Private endpoint in portal. Finally, web app started working as expected :)

sarah-widder commented 9 months ago

For reference, all of the steps required for private networking support are outlined here: https://learn.microsoft.com/en-us/azure/ai-services/openai/how-to/use-your-data-securely including the required application to Microsoft for a private endpoint connection between the Azure OpenAI service and Azure AI search. If you are encountering issues using your web app with private endpoints - please read through the documentation carefully and ensure that all of the steps for the inferencing architecture have been followed, the application was submitted and approved, and all role assignments between the various resources have been assigned.

HXK8 commented 9 months ago

Glad to hear it, @dilipdodiya! I am still waiting for our application to be approved/actioned.

HXK8 commented 8 months ago

After our private endpoint for Azure AI Search was provisioned and approved, the sample app was able to run as expected.

Thanks everyone for their help!