Hi @psugihara ,
Thanks for sharing this interesting project,
In the LlamaServer function complete, you have a precondition to ensure the function is called on the background thread, however, there is no such guarantee. Does that mean you are dropping those requests that end up being submitted to the function on the main thread?
hm perhaps, i'm not sure that the assertion actually runs in release mode. Seems like a bug we should fix to make the LLM never hang the main thread. Thanks for noting!
Hi @psugihara , Thanks for sharing this interesting project, In the LlamaServer function complete, you have a precondition to ensure the function is called on the background thread, however, there is no such guarantee. Does that mean you are dropping those requests that end up being submitted to the function on the main thread?