ranking-agent / strider

A TRAPI-compliant component of ARAGORN that queries distributed KPs and assembles answers to user questions.
MIT License
3 stars 0 forks source link

Multiquery hangs #377

Closed cbizon closed 2 years ago

cbizon commented 2 years ago

Running a multiquery with 37 subqueries. I will see this in strider:

    raise exc from None
  File "/usr/local/lib/python3.9/site-packages/starlette/exceptions.py", line 71, in __call__
    await self.app(scope, receive, sender)
  File "/usr/local/lib/python3.9/site-packages/starlette/routing.py", line 580, in __call__
    await route.handle(scope, receive, send)
  File "/usr/local/lib/python3.9/site-packages/starlette/routing.py", line 241, in handle
    await self.app(scope, receive, send)
  File "/usr/local/lib/python3.9/site-packages/starlette/routing.py", line 55, in app
    await response(scope, receive, send)
  File "/usr/local/lib/python3.9/site-packages/starlette/responses.py", line 146, in __call__
    await self.background()
  File "/usr/local/lib/python3.9/site-packages/starlette/background.py", line 35, in __call__
    await task()
  File "/usr/local/lib/python3.9/site-packages/starlette/background.py", line 18, in __call__
    await self.func(*self.args, **self.kwargs)
  File "/app/./strider/server.py", line 363, in multi_lookup
    await asyncio.gather(*map(single_lookup, query_keys))
  File "/app/./strider/server.py", line 361, in single_lookup
    await client.post(callback, json=query_result)
  File "/usr/local/lib/python3.9/site-packages/httpx/_client.py", line 1807, in post
    return await self.request(
  File "/usr/local/lib/python3.9/site-packages/httpx/_client.py", line 1481, in request
    response = await self.send(
  File "/usr/local/lib/python3.9/site-packages/httpx/_client.py", line 1568, in send
    response = await self._send_handling_auth(
  File "/usr/local/lib/python3.9/site-packages/httpx/_client.py", line 1604, in _send_handling_auth
    response = await self._send_handling_redirects(
  File "/usr/local/lib/python3.9/site-packages/httpx/_client.py", line 1640, in _send_handling_redirects
    response = await self._send_single_request(request, timeout)
  File "/usr/local/lib/python3.9/site-packages/httpx/_client.py", line 1681, in _send_single_request
    ) = await transport.handle_async_request(
  File "/usr/local/lib/python3.9/site-packages/httpx/_transports/default.py", line 278, in handle_async_request
    (
  File "/usr/local/lib/python3.9/contextlib.py", line 137, in __exit__
    self.gen.throw(typ, value, traceback)
  File "/usr/local/lib/python3.9/site-packages/httpx/_transports/default.py", line 78, in map_httpcore_exceptions
    raise mapped_exc(message) from exc
httpx.ReadError

And then the caller never gets all the answers

cbizon commented 2 years ago

multistrider.json.txt

This is the treats query using about 30 of Kara's rules. I was able to get things to run by reducing to 3-5 at a time (most of the time that would work).

cbizon commented 2 years ago

This is mostly fixed, but reliably happens for certain inputs.

With MONDO:0005148 (T2D) strider frequently hangs at 74 the 75 rules for over 1 hour.

I have also witnessed it for MONDO:0005016 though I don't know how many rules got stuck