Hanging after first embedding generated on MPS

semoal commented 5 months ago

System Info

MacOS running with torch or optimum on both happens, with small or big batch size. Model: jinaai/jina-embeddings-v2-base-es

`ERROR:    Exception in ASGI application
Traceback (most recent call last):
  File "/Users/sergiomoreno/Projects/infinity-emb/.venv/lib/python3.11/site-packages/uvicorn/protocols/http/httptools_impl.py", line 426, in run_asgi
    result = await app(  # type: ignore[func-returns-value]
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/sergiomoreno/Projects/infinity-emb/.venv/lib/python3.11/site-packages/uvicorn/middleware/proxy_headers.py", line 84, in __call__
    return await self.app(scope, receive, send)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/sergiomoreno/Projects/infinity-emb/.venv/lib/python3.11/site-packages/fastapi/applications.py", line 292, in __call__
    await super().__call__(scope, receive, send)
  File "/Users/sergiomoreno/Projects/infinity-emb/.venv/lib/python3.11/site-packages/starlette/applications.py", line 122, in __call__
    await self.middleware_stack(scope, receive, send)
  File "/Users/sergiomoreno/Projects/infinity-emb/.venv/lib/python3.11/site-packages/starlette/middleware/errors.py", line 162, in __call__
    await self.app(scope, receive, _send)
  File "/Users/sergiomoreno/Projects/infinity-emb/.venv/lib/python3.11/site-packages/prometheus_fastapi_instrumentator/middleware.py", line 167, in __call__
    await self.app(scope, receive, send_wrapper)
  File "/Users/sergiomoreno/Projects/infinity-emb/.venv/lib/python3.11/site-packages/starlette/middleware/exceptions.py", line 68, in __call__
    await self.app(scope, receive, sender)
  File "/Users/sergiomoreno/Projects/infinity-emb/.venv/lib/python3.11/site-packages/fastapi/middleware/asyncexitstack.py", line 17, in __call__
    await self.app(scope, receive, send)
  File "/Users/sergiomoreno/Projects/infinity-emb/.venv/lib/python3.11/site-packages/starlette/routing.py", line 718, in __call__
    await route.handle(scope, receive, send)
  File "/Users/sergiomoreno/Projects/infinity-emb/.venv/lib/python3.11/site-packages/starlette/routing.py", line 276, in handle
    await self.app(scope, receive, send)
  File "/Users/sergiomoreno/Projects/infinity-emb/.venv/lib/python3.11/site-packages/starlette/routing.py", line 66, in app
    response = await func(request)
               ^^^^^^^^^^^^^^^^^^^
  File "/Users/sergiomoreno/Projects/infinity-emb/.venv/lib/python3.11/site-packages/fastapi/routing.py", line 273, in app
    raw_response = await run_endpoint_function(
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/sergiomoreno/Projects/infinity-emb/.venv/lib/python3.11/site-packages/fastapi/routing.py", line 190, in run_endpoint_function
    return await dependant.call(**values)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/sergiomoreno/Projects/infinity-emb/infinity/server.py", line 169, in _embeddings
    embedding, usage = await models.embedding_model.embed(data.input)
                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/sergiomoreno/Projects/infinity-emb/.venv/lib/python3.11/site-packages/infinity_emb/engine.py", line 123, in embed
    embeddings, usage = await self._batch_handler.embed(sentences)
                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/sergiomoreno/Projects/infinity-emb/.venv/lib/python3.11/site-packages/infinity_emb/inference/batch_handler.py", line 116, in embed
    embeddings, usage = await self._schedule(input_sentences)
                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/sergiomoreno/Projects/infinity-emb/.venv/lib/python3.11/site-packages/infinity_emb/inference/batch_handler.py", line 201, in _schedule
    result = await asyncio.gather(
             ^^^^^^^^^^^^^^^^^^^^^
  File "/Users/sergiomoreno/Projects/infinity-emb/.venv/lib/python3.11/site-packages/infinity_emb/inference/queue.py", line 92, in wait_for_response
    return await item.future
           ^^^^^^^^^^^^^^^^^
asyncio.exceptions.CancelledError
INFO:     ::1:56858 - "POST /embeddings HTTP/1.1" 500 Internal Server Error`

Information

[ ] Docker
[X] The CLI directly via pip

Tasks

[X] An officially supported command
[X] My own modifications

Reproduction

Run this model
Probably be near of out of memory (not sure about this one but could be the reason)
And try to run another query

I can share privately a repro project where is easy reproducible.

Expected behavior

To not crash

michaelfeil commented 5 months ago

Looks like the inference crashes (e.g. CTRL-C event) - how do you run the model?

From a clean installation on your osx, what are the precise steps to replicate this? Are you using the docker container?

I assume, you are stopping / starting the async event loop incorrectly? Are you using astart / astop similar to the server in server.py?

semoal commented 4 months ago

Yes, stopping the event loop while it was generating was causing to not stop until force-clean the terminal. Fixed try catching on the context manager:

@asynccontextmanager
async def lifespan(app: FastAPI):
    instrumentator.expose(app)
    # Load the ML model
    await models.astart()
    logger.info(docs.startup_message(host="localhost", port="8080", prefix=""))
    try:
        yield
    except asyncio.exceptions.CancelledError:
        pass
    # Clean up the ML models and release the resources
    await models.ateardown()

michaelfeil commented 4 months ago

Awesome

michaelfeil / infinity