Closed jasonacox closed 9 months ago
INFO WARNING: This is a development server. Do not use it in a production deployment. Use a production WSGI server instead.
None of the "easy" conversions worked due to the way the threading works to support model output streaming (token streams to browser via socketio).
The WSGI servers I tested (e.g. Gunicorn) was not compatible with the multi-threading required to handle asynchronous streaming from the LLM. For that reason, I switched to ASGI and asyncio.
None of the "easy" conversions worked due to the way the threading works to support model output streaming (token streams to browser via socketio).