imjoy-team / imjoy-rpc

The RPC library used in ImJoy.
MIT License
23 stars 5 forks source link

Irecoverable Timeout error in synchronous server connect #558

Open beniroquai opened 5 months ago

beniroquai commented 5 months ago

Hey @oeway I observed that - in case of lacking internet connection - the login function will timeout and block the application in sync mode. I have not found a way to get around this other than putting the entire connection into a background thread - which I would like to avoid. Is there any way to assign a callback or implement a timeout?

The following code produces the error below. It runs inside an asyncio-thread

    def start_service(self, service_id, server_url="https://chat.bioimage.io", workspace=None, token=None):
        '''
        This logs into the Hypha Server and starts the service.
        It also registers the extensions that will hook up the microsocpe to the chatbot

        service_id: str - the ID of the service that will be replied by the server when you log in
        server_url: str - the URL of the server
        workspace: str - the workspace of the server
        token: str - the token to log in to the server and will be replied by the server when you log in

        '''
        self.__logger.debug(f"Starting service...")
        client_id = service_id + "-client"

        def autoLogin(message):
            # automatically open default browser and return the login token 
            webbrowser.open(message['login_url']) # TODO: pass login token to qtwebview
            print(f"Please open your browser and login at: {message['login_url']}")

        try:
            token = login({"server_url": server_url, 
                       "login_callback": autoLogin, 
                       "timeout": 10})
        except Exception as e:
            # probably timeout error - not connected to the Internet?
            self.__logger.error(e)
            return "probably timeout error - not connected to the Internet?"
        server = connect_to_server(
            {
            "server_url": server_url,
            "token": token}
            )
        # initialize datastorer for image saving and data handling outside the chat prompts, resides on the hypha server
        self.datastore.setup(server, service_id="data-store")
        svc = server.register_service(self.getMicroscopeControlExtensionDefinition())
        self.hyphaURL = f"https://bioimage.io/chat?server={server_url}&extension={svc.id}"
        try:
            # open the chat window in the browser to interact with the herin created connection
            webbrowser.open(self.hyphaURL)
            self._widget.setChatURL(url=f"https://bioimage.io/chat?token={token}&assistant=Skyler&server={server_url}&extension={svc.id}")
            self._isConnected = True
        except:
            pass
        print(f"Extension service registered with id: {svc.id}, you can visit the chatbot at {self.hyphaURL}, and the service at: {server_url}/{server.config.workspace}/services/{svc.id.split(':')[1]}")
2024-06-16 15:50:08 DEBUG [HyphaController] Starting service...
DEBUG:imswitch:[HyphaController] Starting service...
ERROR:websocket-client:Failed to connect to wss://chat.bioimage.io/ws, retrying 1/10000
Traceback (most recent call last):
  File "/Users/bene/mambaforge/envs/imswitch/lib/python3.9/site-packages/websockets/legacy/client.py", line 647, in __await_impl_timeout__
    return await self.__await_impl__()
  File "/Users/bene/mambaforge/envs/imswitch/lib/python3.9/site-packages/websockets/legacy/client.py", line 651, in __await_impl__
    _transport, _protocol = await self._create_connection()
  File "uvloop/loop.pyx", line 1975, in create_connection
  File "/Users/bene/mambaforge/envs/imswitch/lib/python3.9/asyncio/tasks.py", line 413, in wait
    return await _wait(fs, timeout, return_when, loop)
  File "/Users/bene/mambaforge/envs/imswitch/lib/python3.9/asyncio/tasks.py", line 525, in _wait
    await waiter
asyncio.exceptions.CancelledError

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Users/bene/mambaforge/envs/imswitch/lib/python3.9/site-packages/imjoy_rpc/hypha/websocket_client.py", line 81, in open
    self._websocket = await asyncio.wait_for(
  File "/Users/bene/mambaforge/envs/imswitch/lib/python3.9/asyncio/tasks.py", line 479, in wait_for
    return fut.result()
  File "/Users/bene/mambaforge/envs/imswitch/lib/python3.9/asyncio/tasks.py", line 688, in _wrap_awaitable
    return (yield from awaitable.__await__())
  File "/Users/bene/mambaforge/envs/imswitch/lib/python3.9/site-packages/websockets/legacy/client.py", line 647, in __await_impl_timeout__
    return await self.__await_impl__()
  File "/Users/bene/mambaforge/envs/imswitch/lib/python3.9/site-packages/websockets/legacy/async_timeout.py", line 169, in __aexit__
    self._do_exit(exc_type)
  File "/Users/bene/mambaforge/envs/imswitch/lib/python3.9/site-packages/websockets/legacy/async_timeout.py", line 252, in _do_exit
    raise asyncio.TimeoutError
asyncio.exceptions.TimeoutError
oeway commented 5 months ago

This is unfortunately by design, it has to block the thread to obtain the token to continue.

Can you take out the login logic to be triggered by a button? Say, when you click a button, you run login directly or in a thread to obtain the token? after that with the token the server can start?

beniroquai commented 5 months ago

Thanks for the comment. The button-logic is the default UI interaction. I was wondering how I can do that through a REST API call from fastapi. I'll recheck what'S going on. The login works perfectly from PyQt side, but fails from fastapi.

beniroquai commented 5 months ago

Ok, I can force fastapi calls to run on the UIThread, then it works,..but I removed this feature in the headless mode :D I'll need to find a way I guess ;)

@APIExport(runOnUIThread=True)
def start_service(self, ...):
    ...

It connects a signal (wrappingSignal) to the function call (_apiCall) to ensure it is executed on the UI thread..

@jacopoabramo, did you ever touch this?

oeway commented 5 months ago

For fastapi, it has native async support, would it help to switch to async RPC connection?

beniroquai commented 5 months ago

I tried that already but there was a timeout happening. Let me try again.

On Sun, Jun 23, 2024, 12:22 Wei Ouyang @.***> wrote:

For fastapi, it has native async support, would it help to switch to async RPC connection?

— Reply to this email directly, view it on GitHub https://github.com/imjoy-team/imjoy-rpc/issues/558#issuecomment-2184934081, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABBE5ODAASLIRO3UGTZKDULZI2OVZAVCNFSM6AAAAABJMT7PYGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCOBUHEZTIMBYGE . You are receiving this because you authored the thread.Message ID: @.***>

jacopoabramo commented 5 months ago

@jacopoabramo, did you ever touch this?

Nope, the API export thingy is one of the things that I find very mysterious and potentially dangerous and wanted to document it a little bit. The only thing I could understand (and for which I'm not sure) is that whenever you expose a function using that decorator it will be part of the main Qt event loop

beniroquai commented 5 months ago

Would make sense with the use of the signals. This behaviour is dropped in headless mode (for now). Need to understand this https://github.com/openUC2/ImSwitch/blob/NOQT/imswitch/imcommon/model/api.py#L41 I removed it since it didn't work with Psygnals..

beniroquai commented 4 months ago

@oeway I'm testing the current code in async mode now. It'S partially working, but I don't get a proper connection after I login. It seems like everything works, but there is no ID returned by the service.

I updated the login procedure to work asynchronously: https://github.com/openUC2/ImSwitch/blob/NOQT/imswitch/imcontrol/controller/controllers/HyphaController.py#L265

I wrapped the callback (i.e. open login URL in browser) in async, which partially solved the errors I got previously. But I think it crashes here (in websocket_client.py:

        return await svc.check(context["key"], timeout)
    except Exception as error:
        raise error
    finally:
        await server.disconnect()

it does not return the svc check but instead goes straight into the finally statement. Also the timeouts I provde are not working as expected. But this should not be the issue.

This is all in the NOQT version of ImSwitch. Not sure if the missing QT event loop cause any issues here?

I get the following error:

NFO:     127.0.0.1:63675 - "GET /HyphaController/start_hypha_service_async?service_id=UC2ImSwitch&server_url=https%3A%2F%2Fchat.bioimage.io HTTP/1.1" 500 Internal Server Error
ERROR:    Exception in ASGI application
Traceback (most recent call last):
  File "/Users/bene/miniconda3/envs/imswitch/lib/python3.10/site-packages/uvicorn/protocols/http/httptools_impl.py", line 411, in run_asgi
    result = await app(  # type: ignore[func-returns-value]
  File "/Users/bene/miniconda3/envs/imswitch/lib/python3.10/site-packages/uvicorn/middleware/proxy_headers.py", line 69, in __call__
    return await self.app(scope, receive, send)
  File "/Users/bene/miniconda3/envs/imswitch/lib/python3.10/site-packages/fastapi/applications.py", line 1054, in __call__
    await super().__call__(scope, receive, send)
  File "/Users/bene/miniconda3/envs/imswitch/lib/python3.10/site-packages/starlette/applications.py", line 123, in __call__
    await self.middleware_stack(scope, receive, send)
  File "/Users/bene/miniconda3/envs/imswitch/lib/python3.10/site-packages/starlette/middleware/errors.py", line 186, in __call__
    raise exc
  File "/Users/bene/miniconda3/envs/imswitch/lib/python3.10/site-packages/starlette/middleware/errors.py", line 164, in __call__
    await self.app(scope, receive, _send)
  File "/Users/bene/miniconda3/envs/imswitch/lib/python3.10/site-packages/starlette/middleware/cors.py", line 85, in __call__
    await self.app(scope, receive, send)
  File "/Users/bene/miniconda3/envs/imswitch/lib/python3.10/site-packages/starlette/middleware/httpsredirect.py", line 19, in __call__
    await self.app(scope, receive, send)
  File "/Users/bene/miniconda3/envs/imswitch/lib/python3.10/site-packages/starlette/middleware/exceptions.py", line 65, in __call__
    await wrap_app_handling_exceptions(self.app, conn)(scope, receive, send)
  File "/Users/bene/miniconda3/envs/imswitch/lib/python3.10/site-packages/starlette/_exception_handler.py", line 64, in wrapped_app
    raise exc
  File "/Users/bene/miniconda3/envs/imswitch/lib/python3.10/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app
    await app(scope, receive, sender)
  File "/Users/bene/miniconda3/envs/imswitch/lib/python3.10/site-packages/starlette/routing.py", line 756, in __call__
    await self.middleware_stack(scope, receive, send)
  File "/Users/bene/miniconda3/envs/imswitch/lib/python3.10/site-packages/starlette/routing.py", line 776, in app
    await route.handle(scope, receive, send)
  File "/Users/bene/miniconda3/envs/imswitch/lib/python3.10/site-packages/starlette/routing.py", line 297, in handle
    await self.app(scope, receive, send)
  File "/Users/bene/miniconda3/envs/imswitch/lib/python3.10/site-packages/starlette/routing.py", line 77, in app
    await wrap_app_handling_exceptions(app, request)(scope, receive, send)
  File "/Users/bene/miniconda3/envs/imswitch/lib/python3.10/site-packages/starlette/_exception_handler.py", line 64, in wrapped_app
    raise exc
  File "/Users/bene/miniconda3/envs/imswitch/lib/python3.10/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app
    await app(scope, receive, sender)
  File "/Users/bene/miniconda3/envs/imswitch/lib/python3.10/site-packages/starlette/routing.py", line 72, in app
    response = await func(request)
  File "/Users/bene/miniconda3/envs/imswitch/lib/python3.10/site-packages/fastapi/routing.py", line 278, in app
    raw_response = await run_endpoint_function(
  File "/Users/bene/miniconda3/envs/imswitch/lib/python3.10/site-packages/fastapi/routing.py", line 191, in run_endpoint_function
    return await dependant.call(**values)
  File "/Users/bene/Dropbox/Dokumente/Promotion/PROJECTS/MicronController/ImSwitch/imswitch/imcontrol/controller/server/ImSwitchServer.py", line 262, in wrapper
    return await func(*args, **kwargs)
  File "/Users/bene/Dropbox/Dokumente/Promotion/PROJECTS/MicronController/ImSwitch/imswitch/imcontrol/controller/controllers/HyphaController.py", line 296, in start_hypha_service_async
    self.hyphaURL = f"https://bioimage.io/chat?server={server_url}&extension={svc.id}"
AttributeError: 'FuturePromise' object has no attribute 'id'
ERROR:websocket-client:Failed to connect to wss://chat.bioimage.io/ws, retrying 1/10000
Traceback (most recent call last):
  File "/Users/bene/miniconda3/envs/imswitch/lib/python3.10/site-packages/websockets/legacy/protocol.py", line 963, in transfer_data
    message = await self.read_message()
  File "/Users/bene/miniconda3/envs/imswitch/lib/python3.10/site-packages/websockets/legacy/protocol.py", line 1033, in read_message
    frame = await self.read_data_frame(max_size=self.max_size)
  File "/Users/bene/miniconda3/envs/imswitch/lib/python3.10/site-packages/websockets/legacy/protocol.py", line 1121, in read_data_frame
    await self.write_close_frame(self.close_rcvd, frame.data)
  File "/Users/bene/miniconda3/envs/imswitch/lib/python3.10/site-packages/websockets/legacy/protocol.py", line 1236, in write_close_frame
    await self.write_frame(True, OP_CLOSE, data, _state=State.CLOSING)
  File "/Users/bene/miniconda3/envs/imswitch/lib/python3.10/site-packages/websockets/legacy/protocol.py", line 1209, in write_frame
    await self.drain()
  File "/Users/bene/miniconda3/envs/imswitch/lib/python3.10/site-packages/websockets/legacy/protocol.py", line 1190, in drain
    async with self._drain_lock:
  File "/Users/bene/miniconda3/envs/imswitch/lib/python3.10/asyncio/locks.py", line 14, in __aenter__
    await self.acquire()
  File "/Users/bene/miniconda3/envs/imswitch/lib/python3.10/asyncio/locks.py", line 114, in acquire
    await fut
asyncio.exceptions.CancelledError

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/Users/bene/miniconda3/envs/imswitch/lib/python3.10/site-packages/imjoy_rpc/hypha/websocket_client.py", line 127, in _listen
    data = await ws.recv()
  File "/Users/bene/miniconda3/envs/imswitch/lib/python3.10/site-packages/websockets/legacy/protocol.py", line 568, in recv
    await self.ensure_open()
  File "/Users/bene/miniconda3/envs/imswitch/lib/python3.10/site-packages/websockets/legacy/protocol.py", line 939, in ensure_open
    raise self.connection_closed_exc()
websockets.exceptions.ConnectionClosedError: received 1011 (internal error) keepalive ping timeout; then sent 1011 (internal error) keepalive ping timeout

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Users/bene/miniconda3/envs/imswitch/lib/python3.10/site-packages/imjoy_rpc/hypha/websocket_client.py", line 81, in open
    self._websocket = await asyncio.wait_for(
  File "/Users/bene/miniconda3/envs/imswitch/lib/python3.10/asyncio/tasks.py", line 445, in wait_for
    return fut.result()
  File "/Users/bene/miniconda3/envs/imswitch/lib/python3.10/asyncio/tasks.py", line 650, in _wrap_awaitable
    return (yield from awaitable.__await__())
  File "/Users/bene/miniconda3/envs/imswitch/lib/python3.10/site-packages/websockets/legacy/client.py", line 646, in __await_impl_timeout__
    async with asyncio_timeout(self.open_timeout):
  File "/Users/bene/miniconda3/envs/imswitch/lib/python3.10/site-packages/websockets/legacy/async_timeout.py", line 169, in __aexit__
    self._do_exit(exc_type)
  File "/Users/bene/miniconda3/envs/imswitch/lib/python3.10/site-packages/websockets/legacy/async_timeout.py", line 252, in _do_exit
    raise asyncio.TimeoutError
asyncio.exceptions.TimeoutError
beniroquai commented 4 months ago

@oeway partially solved the error by wrapping

svc = await server.register_service(self.getMicroscopeControlExtensionDefinition())

with the await.

However, the extension is not registered in the online chatbot (yet). I'm getting timeouts. I'll check once I have a better internet connection. Maybe the problem is solved now.

@jacopoabramo I had to modify the generateApi function slgithly. It's always wrapped into an asnyc def, but there can never be an await inside any function. There is a new API key for async functions now.