marqo-ai / marqo

Unified embedding generation and search engine. Also available on cloud - cloud.marqo.ai
https://www.marqo.ai/
Apache License 2.0
4.57k stars 188 forks source link

[BUG] 5XX Error Codes from marqo instance #176

Closed VitusAcabado closed 1 year ago

VitusAcabado commented 1 year ago

Describe the bug A clear and concise description of what the bug is. Returns 500 error when error received from Marqo-OS should be 401 (ran via hitting api gateway endpoint in cloud)

INFO: 172.31.4.94:1548 - "GET /indexes/jesse-raritypunks-big-test/documents/1e6f3e59-52bd-4e68-a3b3-3d68e5ab8a1d?expose_facets=True HTTP/1.1" 500 Internal Server Error 2022-11-17T03:35:06.390415845Z ERROR: Exception in ASGI application 2022-11-17T03:35:06.390427995Z Traceback (most recent call last): 2022-11-17T03:35:06.390435955Z File "/app/src/marqo/_httprequests.py", line 134, in __validate 2022-11-17T03:35:06.390442025Z request.raise_for_status() 2022-11-17T03:35:06.390448155Z File "/usr/local/lib/python3.8/dist-packages/requests/models.py", line 1021, in raise_for_status 2022-11-17T03:35:06.390453995Z raise HTTPError(http_error_msg, response=self) 2022-11-17T03:35:06.390459015Z requests.exceptions.HTTPError: 401 Client Error: UNAUTHORIZED for url: https://elastic:4KLIZDSCR@prod-marqo-env.us-east-1.elasticbeanstalk.com/nznbwck7gtjv/jesse-raritypunks-big-test/_doc/1e6f3e59-52bd-4e68-a3b3-3d68e5ab8a1d 2022-11-17T03:35:06.390465835Z 2022-11-17T03:35:06.390527546Z During handling of the above exception, another exception occurred: 2022-11-17T03:35:06.390534286Z 2022-11-17T03:35:06.390548256Z Traceback (most recent call last): 2022-11-17T03:35:06.390555076Z File "/usr/local/lib/python3.8/dist-packages/uvicorn/protocols/http/httptools_impl.py", line 419, in run_asgi 2022-11-17T03:35:06.390560776Z result = await app( # type: ignore[func-returns-value] 2022-11-17T03:35:06.390566146Z File "/usr/local/lib/python3.8/dist-packages/uvicorn/middleware/proxy_headers.py", line 78, in __call__ 2022-11-17T03:35:06.390572357Z return await self.app(scope, receive, send) 2022-11-17T03:35:06.390577427Z File "/usr/local/lib/python3.8/dist-packages/fastapi/applications.py", line 270, in __call__ 2022-11-17T03:35:06.390597067Z await super().__call__(scope, receive, send) 2022-11-17T03:35:06.390603227Z File "/usr/local/lib/python3.8/dist-packages/starlette/applications.py", line 124, in __call__ 2022-11-17T03:35:06.390609397Z await self.middleware_stack(scope, receive, send) 2022-11-17T03:35:06.390615087Z File "/usr/local/lib/python3.8/dist-packages/starlette/middleware/errors.py", line 184, in __call__ 2022-11-17T03:35:06.390620437Z raise exc 2022-11-17T03:35:06.390627147Z File "/usr/local/lib/python3.8/dist-packages/starlette/middleware/errors.py", line 162, in __call__ 2022-11-17T03:35:06.390632517Z await self.app(scope, receive, _send) 2022-11-17T03:35:06.390638077Z File "/usr/local/lib/python3.8/dist-packages/starlette/middleware/exceptions.py", line 75, in __call__ 2022-11-17T03:35:06.390644177Z raise exc 2022-11-17T03:35:06.390648937Z File "/usr/local/lib/python3.8/dist-packages/starlette/middleware/exceptions.py", line 64, in __call__ 2022-11-17T03:35:06.390655008Z await self.app(scope, receive, sender) 2022-11-17T03:35:06.390660048Z File "/usr/local/lib/python3.8/dist-packages/fastapi/middleware/asyncexitstack.py", line 21, in __call__ 2022-11-17T03:35:06.390666448Z raise e 2022-11-17T03:35:06.390671268Z File "/usr/local/lib/python3.8/dist-packages/fastapi/middleware/asyncexitstack.py", line 18, in __call__ 2022-11-17T03:35:06.390677588Z await self.app(scope, receive, send) 2022-11-17T03:35:06.390684038Z File "/usr/local/lib/python3.8/dist-packages/starlette/routing.py", line 680, in __call__ 2022-11-17T03:35:06.390689318Z await route.handle(scope, receive, send) 2022-11-17T03:35:06.390697198Z File "/usr/local/lib/python3.8/dist-packages/starlette/routing.py", line 275, in handle 2022-11-17T03:35:06.390702648Z await self.app(scope, receive, send) 2022-11-17T03:35:06.390708888Z File "/usr/local/lib/python3.8/dist-packages/starlette/routing.py", line 65, in app 2022-11-17T03:35:06.390714278Z response = await func(request) 2022-11-17T03:35:06.390719168Z File "/usr/local/lib/python3.8/dist-packages/fastapi/routing.py", line 235, in app 2022-11-17T03:35:06.390724288Z raw_response = await run_endpoint_function( 2022-11-17T03:35:06.390729578Z File "/usr/local/lib/python3.8/dist-packages/fastapi/routing.py", line 163, in run_endpoint_function 2022-11-17T03:35:06.390735829Z return await run_in_threadpool(dependant.call, **values) 2022-11-17T03:35:06.390741409Z File "/usr/local/lib/python3.8/dist-packages/starlette/concurrency.py", line 41, in run_in_threadpool 2022-11-17T03:35:06.390746569Z return await anyio.to_thread.run_sync(func, *args) 2022-11-17T03:35:06.390752789Z File "/usr/local/lib/python3.8/dist-packages/anyio/to_thread.py", line 31, in run_sync 2022-11-17T03:35:06.390758139Z return await get_asynclib().run_sync_in_worker_thread( 2022-11-17T03:35:06.390769529Z File "/usr/local/lib/python3.8/dist-packages/anyio/_backends/_asyncio.py", line 937, in run_sync_in_worker_thread 2022-11-17T03:35:06.390775009Z return await future 2022-11-17T03:35:06.390780689Z File "/usr/local/lib/python3.8/dist-packages/anyio/_backends/_asyncio.py", line 867, in run 2022-11-17T03:35:06.390786689Z result = context.run(func, *args) 2022-11-17T03:35:06.390802089Z File "/app/src/marqo/tensor_search/./api.py", line 159, in get_document_by_id 2022-11-17T03:35:06.390808429Z return tensor_search.get_document_by_id( 2022-11-17T03:35:06.390813419Z File "/app/src/marqo/tensor_search/tensor_search.py", line 599, in get_document_by_id 2022-11-17T03:35:06.390820140Z res = HttpRequests(config).get( 2022-11-17T03:35:06.390825040Z File "/app/src/marqo/_httprequests.py", line 93, in get 2022-11-17T03:35:06.390831610Z res = self.send_request(requests.get, path=path, body=body, content_type=content_type) 2022-11-17T03:35:06.390837050Z File "/app/src/marqo/_httprequests.py", line 80, in send_request 2022-11-17T03:35:06.390843030Z return self.__validate(response) 2022-11-17T03:35:06.390848630Z File "/app/src/marqo/_httprequests.py", line 137, in __validate 2022-11-17T03:35:06.390853680Z convert_to_marqo_web_error_and_raise(response=request, err=err) 2022-11-17T03:35:06.390859600Z File "/app/src/marqo/_httprequests.py", line 159, in convert_to_marqo_web_error_and_raise 2022-11-17T03:35:06.390866360Z open_search_error_type = response_dict["error"]["type"] 2022-11-17T03:35:06.390872420Z TypeError: string indices must be integers

To Reproduce Steps to reproduce the behavior:

  1. Run large scale expose_facets test on cloud test instance
  2. See error randomly

Expected behavior If Successful: Should return results of document with facets exposed

If Failure: Propagate Error code from Marqo-OS

Screenshots On Successful query request: image

Desktop (please complete the following information):

Additional context Add any other context about the problem here.

VitusAcabado commented 1 year ago

After further investigation:

502 - root cause is from the connection between marqo and the load balancer in front of the marqo instance. Basically load balancer attached to marqo is returning a 502 because the connection between the load balancer and the marqo instance is cut prematurely by marqo while there is an outstanding request, in this case it was a 429 from s2search that was processed but didn't pass through to the load balancer connection since it was closed (I've checked the load balancer logs to verify this). "The 502 Bad Gateway error is caused when the ALB sends a request to a service at the same time that the service closes the connection by sending the FIN segment to the ALB socket. The ALB socket receives FIN, acknowledges, and starts a new handshake procedure. Meanwhile, the socket on the service side has just received a data request referencing the previous (now closed) connection. Because it can’t handle it, it sends an RST segment back to the ALB, and then the ALB returns a 502 to the user." source (there's a great diagram in the article that explains this) Recommendation - we need to configure the keep-alive settings on uvicorn so that the uvicorn keep-alive value is >= ALB timeout source for recommendation: link

500 - error from marqo instance where response from s2search is not a valid json Recommendation - better logging from s2search so that marqo can properly propagate the error. Create a error catch as well if an error is returned from s2search/marqo-os

VitusAcabado commented 1 year ago

Closing issue as PRs for the fixes has been merged to Marqo and S2search