mlexchange / mlex_highres_segmentation

A Dash interface for ML-based segmentation of user-annotated large multi-dimensional image data
Other
5 stars 4 forks source link

Tiled server timeout #96

Open hannahker opened 1 year ago

hannahker commented 1 year ago

Have noticed some timeout errors in the Tiled https requests since updating the Tiled server version: httpx.ReadTimeout: The read operation timed out. This is present both locally and on the deployed app (deployed on Plotly's servers). Some more testing is needed to see if this can be reproduced reliably.

hannahker commented 1 year ago

FYI @Wiebke

hannahker commented 1 year ago

Hit again today multiple times while running app locally after panning around to view different slices of the lobster claw dataset.

Traceback (most recent call last):
  File "/Users/hannahker/Desktop/mlex/mlex_highres_segmentation/venv/lib/python3.9/site-packages/httpcore/_exceptions.py", line 10, in map_exceptions
    yield
  File "/Users/hannahker/Desktop/mlex/mlex_highres_segmentation/venv/lib/python3.9/site-packages/httpcore/_backends/sync.py", line 28, in read
    return self._sock.recv(max_bytes)
  File "/Users/hannahker/miniconda3/lib/python3.9/ssl.py", line 1226, in recv
    return self.read(buflen)
  File "/Users/hannahker/miniconda3/lib/python3.9/ssl.py", line 1101, in read
    return self._sslobj.read(len)
socket.timeout: The read operation timed out

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/Users/hannahker/Desktop/mlex/mlex_highres_segmentation/venv/lib/python3.9/site-packages/httpx/_transports/default.py", line 60, in map_httpcore_exceptions
    yield
  File "/Users/hannahker/Desktop/mlex/mlex_highres_segmentation/venv/lib/python3.9/site-packages/httpx/_transports/default.py", line 218, in handle_request
    resp = self._pool.handle_request(req)
  File "/Users/hannahker/Desktop/mlex/mlex_highres_segmentation/venv/lib/python3.9/site-packages/httpcore/_sync/connection_pool.py", line 262, in handle_request
    raise exc
  File "/Users/hannahker/Desktop/mlex/mlex_highres_segmentation/venv/lib/python3.9/site-packages/httpcore/_sync/connection_pool.py", line 245, in handle_request
    response = connection.handle_request(request)
  File "/Users/hannahker/Desktop/mlex/mlex_highres_segmentation/venv/lib/python3.9/site-packages/httpcore/_sync/connection.py", line 96, in handle_request
    return self._connection.handle_request(request)
  File "/Users/hannahker/Desktop/mlex/mlex_highres_segmentation/venv/lib/python3.9/site-packages/httpcore/_sync/http11.py", line 121, in handle_request
    raise exc
  File "/Users/hannahker/Desktop/mlex/mlex_highres_segmentation/venv/lib/python3.9/site-packages/httpcore/_sync/http11.py", line 99, in handle_request
    ) = self._receive_response_headers(**kwargs)
  File "/Users/hannahker/Desktop/mlex/mlex_highres_segmentation/venv/lib/python3.9/site-packages/httpcore/_sync/http11.py", line 164, in _receive_response_headers
    event = self._receive_event(timeout=timeout)
  File "/Users/hannahker/Desktop/mlex/mlex_highres_segmentation/venv/lib/python3.9/site-packages/httpcore/_sync/http11.py", line 200, in _receive_event
    data = self._network_stream.read(
  File "/Users/hannahker/Desktop/mlex/mlex_highres_segmentation/venv/lib/python3.9/site-packages/httpcore/_backends/sync.py", line 28, in read
    return self._sock.recv(max_bytes)
  File "/Users/hannahker/miniconda3/lib/python3.9/contextlib.py", line 137, in __exit__
    self.gen.throw(typ, value, traceback)
  File "/Users/hannahker/Desktop/mlex/mlex_highres_segmentation/venv/lib/python3.9/site-packages/httpcore/_exceptions.py", line 14, in map_exceptions
    raise to_exc(exc) from exc
httpcore.ReadTimeout: The read operation timed out

The above exception was the direct cause of the following exception:

httpx.ReadTimeout: The read operation timed out
cleaaum commented 1 year ago

I can confirm this has been happening as well on my end during local development.

cleaaum commented 1 year ago

And every once in a while I get this error:

Traceback (most recent call last):
  File "/Users/cleaaum/Documents/mlex_highres_segmentation/app.py", line 3, in <module>
    from components.control_bar import layout as control_bar_layout
  File "/Users/cleaaum/Documents/mlex_highres_segmentation/components/control_bar.py", line 4, in <module>
    from utils import data_utils
  File "/Users/cleaaum/Documents/mlex_highres_segmentation/utils/data_utils.py", line 106, in <module>
    client = from_uri(TILED_URI, api_key=API_KEY)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/cleaaum/opt/miniconda3/envs/lbl/lib/python3.11/site-packages/tiled/client/constructors.py", line 61, in from_uri
    context, node_path_parts = Context.from_any_uri(
                               ^^^^^^^^^^^^^^^^^^^^^
  File "/Users/cleaaum/opt/miniconda3/envs/lbl/lib/python3.11/site-packages/tiled/client/context.py", line 273, in from_any_uri
    context = cls(
              ^^^^
  File "/Users/cleaaum/opt/miniconda3/envs/lbl/lib/python3.11/site-packages/tiled/client/context.py", line 151, in __init__
    self.http_client.get(
  File "/Users/cleaaum/opt/miniconda3/envs/lbl/lib/python3.11/site-packages/httpx/_client.py", line 1041, in get
    return self.request(
           ^^^^^^^^^^^^^
  File "/Users/cleaaum/opt/miniconda3/envs/lbl/lib/python3.11/site-packages/httpx/_client.py", line 814, in request
    return self.send(request, auth=auth, follow_redirects=follow_redirects)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/cleaaum/opt/miniconda3/envs/lbl/lib/python3.11/site-packages/httpx/_client.py", line 901, in send
    response = self._send_handling_auth(
               ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/cleaaum/opt/miniconda3/envs/lbl/lib/python3.11/site-packages/httpx/_client.py", line 929, in _send_handling_auth
    response = self._send_handling_redirects(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/cleaaum/opt/miniconda3/envs/lbl/lib/python3.11/site-packages/httpx/_client.py", line 966, in _send_handling_redirects
    response = self._send_single_request(request)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/cleaaum/opt/miniconda3/envs/lbl/lib/python3.11/site-packages/httpx/_client.py", line 1002, in _send_single_request
    response = transport.handle_request(request)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/cleaaum/opt/miniconda3/envs/lbl/lib/python3.11/site-packages/tiled/client/transport.py", line 85, in handle_request
    response = self.transport.handle_request(request)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/cleaaum/opt/miniconda3/envs/lbl/lib/python3.11/site-packages/httpx/_transports/default.py", line 217, in handle_request
    with map_httpcore_exceptions():
  File "/Users/cleaaum/opt/miniconda3/envs/lbl/lib/python3.11/contextlib.py", line 155, in __exit__
    self.gen.throw(typ, value, traceback)
  File "/Users/cleaaum/opt/miniconda3/envs/lbl/lib/python3.11/site-packages/httpx/_transports/default.py", line 77, in map_httpcore_exceptions
    raise mapped_exc(message) from exc
httpx.ConnectError: [Errno 8] nodename nor servname provided, or not known
dylanmcreynolds commented 1 year ago

It sounds to me that read timeout is hitting it's default 5 seconds for these requests, and that we're not seeing it because we're closer to the service, maybe?

Can you try changing the default timeout? The tiled client uses the httpx package for it's http communication, and exposes the ability to set the timeout to something large as a test?

import httpx
client = from_uri("http://localhost:8000/", api_key=<key>, timeout=httpx.Timeout(60.0))
cleaaum commented 1 year ago

Other (and third type of error) Ive been getting:

Traceback (most recent call last):
  File "/Users/cleaaum/Documents/mlex_highres_segmentation/app.py", line 4, in <module>
    from callbacks.control_bar import *
  File "/Users/cleaaum/Documents/mlex_highres_segmentation/callbacks/control_bar.py", line 28, in <module>
    from utils.data_utils import (
  File "/Users/cleaaum/Documents/mlex_highres_segmentation/utils/data_utils.py", line 107, in <module>
    client = from_uri(TILED_URI, api_key=API_KEY, timeout=httpx.Timeout(30.0))
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/cleaaum/opt/miniconda3/envs/lbl/lib/python3.11/site-packages/tiled/client/constructors.py", line 69, in from_uri
    return from_context(
           ^^^^^^^^^^^^^
  File "/Users/cleaaum/opt/miniconda3/envs/lbl/lib/python3.11/site-packages/tiled/client/constructors.py", line 134, in from_context
    content = handle_error(
              ^^^^^^^^^^^^^
  File "/Users/cleaaum/opt/miniconda3/envs/lbl/lib/python3.11/site-packages/tiled/client/utils.py", line 18, in handle_error
    response.raise_for_status()
  File "/Users/cleaaum/opt/miniconda3/envs/lbl/lib/python3.11/site-packages/httpx/_models.py", line 749, in raise_for_status
    raise HTTPStatusError(message, request=request, response=self)
httpx.HTTPStatusError: Server error '500 Internal Server Error' for url 'https://mlex-segmentation.als.lbl.gov/api/v1/metadata/'
For more information check: https://httpstatuses.com/500
dylanmcreynolds commented 1 year ago

Sorry about that. It looks like an issue in Tiled. I think that have gotten around it be reducing the number of Tiled pods in our setup to just 1. (If you're curious, I reported the issue here ).

Can you try again?

hannahker commented 1 year ago

@dylanmcreynolds @Wiebke I'm also still getting that 500 Internal Server Error mentioned above. We're also still encountering some timeouts and quite long wait times to retrieve data from the server. I've bumped up the Timeout param in the Tiled client, which has helped, but we're still finding this to be a blocker for development. It still seems to be intermittent -- sometimes things are snappy, other times it seems almost every request times out.

Are there any other strategies that we can look into to improve the consistency of performance on the Tiled server? Client-side caching might help a bit here.

dylanmcreynolds commented 1 year ago

@hannahker, for the 500 errors, there is a clear path, but it will take some time, and I am unfortunately on the road next week.

For the timeouts and data rates, there are two strategies, one that we can tackle on the server, and one that you can probably tackle in dash.

hannahker commented 1 year ago

We've addressed this for now by enhancing the data available when running a local Tiled server. This is a suitable workaround for our local development, but we're still hitting these issues frequently on the apps deployed to our servers, making it difficult to properly test our work in a deployed environment.

Wiebke commented 1 year ago

See additional comments on client-side caching in #133