hafriedlander / stable-diffusion-grpcserver

An implementation of a server for the Stability AI Stable Diffusion API
Apache License 2.0
172 stars 23 forks source link

fix: improve error handling for streaming responses #21

Closed diffubik closed 1 year ago

diffubik commented 1 year ago

If an error like StopIteration is thrown in _do_streaming_response, the HTTP response does not contain any of the error details and the grpc status defaults to 2 - Unknown.

This change makes some of the header/trailer setup more reusable and passes the trailer data to the response in the error case.

hafriedlander commented 1 year ago

I spent a little time investigating what's supposed to happen here - the spec isn't super clear what should happen if an exception is raised by a GRPC servicer method (or I can't see one).

StopIteration in particular isn't really an error. It's just the way all python iterators say "there's no more items in this iterator".

Do you have a simple(ish) way to trigger this issue?

diffubik commented 1 year ago

For instance when passing an invalid engineId, the gRPC caller has little info about what went wrong

I actually realize now that the error gets logged on the server side, but I only read the part of the stack trace from WSGI application error onwards...

Traceback (most recent call last):
  File "/home/sdiff/src/stable-diffusion-grpcserver/sdgrpcserver/services/generate.py", line 412, in Generate
    with self._manager.with_engine(request.engine_id) as engine:
  File "/home/sdiff/.conda/envs/sd-grpc-server/lib/python3.10/contextlib.py", line 135, in __enter__
    return next(self.gen)
  File "/home/sdiff/src/stable-diffusion-grpcserver/sdgrpcserver/manager.py", line 1379, in with_engine
    spec = self._find_spec(id=id)
  File "/home/sdiff/src/stable-diffusion-grpcserver/sdgrpcserver/manager.py", line 1274, in _find_spec
    res = self._find_specs(id=id, model_id=model_id)
  File "/home/sdiff/src/stable-diffusion-grpcserver/sdgrpcserver/manager.py", line 1255, in _find_specs
    raise ValueError("Must provide one of id or model_id")
ValueError: Must provide one of id or model_id
WSGI application error
Traceback (most recent call last):
  File "/home/sdiff/.conda/envs/sd-grpc-server/lib/python3.10/site-packages/twisted/python/threadpool.py", line 260, in <lambda>
    inContext.theWork = lambda: context.call(  # type: ignore[attr-defined]
  File "/home/sdiff/.conda/envs/sd-grpc-server/lib/python3.10/site-packages/twisted/python/context.py", line 117, in callWithContext
    return self.currentContext().callWithContext(ctx, func, *args, **kw)
  File "/home/sdiff/.conda/envs/sd-grpc-server/lib/python3.10/site-packages/twisted/python/context.py", line 84, in callWithContext
    self.contexts.pop()
  File "/home/sdiff/.conda/envs/sd-grpc-server/lib/python3.10/site-packages/twisted/web/wsgi.py", line 530, in run
    self.started = True
--- <exception caught here> ---
  File "/home/sdiff/.conda/envs/sd-grpc-server/lib/python3.10/site-packages/twisted/web/wsgi.py", line 500, in run
    for elem in appIterator:
  File "/home/sdiff/src/stable-diffusion-grpcserver/sdgrpcserver/sonora/wsgi.py", line 123, in _do_grpc_request
    yield from self._do_streaming_response(
builtins.RuntimeError: generator raised StopIteration

For this sort of invalid request, we do set some context data here: https://github.com/hafriedlander/stable-diffusion-grpcserver/blob/dev/sdgrpcserver/services/generate.py#L529-L531 and don't propagate the exception. Without this change that context is not returned to the user and they only get a generic "unknown" gRPC error.

StopIteration in particular isn't really an error. It's just the way all python iterators say "there's no more items in this iterator".

If it bubbles up from here though, it is a real error: https://github.com/hafriedlander/stable-diffusion-grpcserver/blob/dev/sdgrpcserver/sonora/wsgi.py#L136 - if there are zero answers, next() cannot work.

hafriedlander commented 1 year ago

Ah ha. Cool, thanks. I think that's incorrect behaviour by Sonora - it should be OK to just return nothing from a streaming rpc handler. Let me look it up...

From what I can tell, it's up to the individual Servicer instances to catch exceptions and call context.abort if there's an issue. Dropping an exception out of a Servicer isn't allowed.

hafriedlander commented 1 year ago

Hmm. Don't see anything in the spec, but the Google grpc adapter does seem to work with empty streams.

https://github.com/grpc/grpc/blob/896bfe373f30da654e39b5ccff4f0454a58c153e/src/python/grpcio/grpc/_server.py#L509

I think I'd prefer to fix the behaviour here in Sonora, and then wrap all the Servicer methods with something that catches all exceptions and turns it into a context.Abort call. (There are now three gateways that access the Servicer, so fixing this in a way that fixes it for all of them is better.)

diffubik commented 1 year ago

sgtm

hafriedlander commented 1 year ago

Done here https://github.com/hafriedlander/stable-diffusion-grpcserver/commit/f8aa091606587edd7326f64fe91c4ab418d53a6f