Open lakinduakash opened 1 month ago
@achraf-mer Just wondering if we could remove the Stack mode, is this in use ? Ideally on K8s vLLM should run separately instead of the same pod as h2oGPT I think. WDYT ?
@achraf-mer Just wondering if we could remove the Stack mode, is this in use ? Ideally on K8s vLLM should run separately instead of the same pod as h2oGPT I think. WDYT ?
yes, we can do separate and keep the help straightforward, let's do, I think we might have used the same pod for latency considerations, but since vLLM can be resource intensive, it is best IMO to have on a separate pod. (more isolation and we can scale separately)
@lakinduakash Lets remove Stack mode from h2oGPT and the checks as well, similar to what was done with Agents
@lakinduakash Lets remove Stack mode from h2oGPT and the checks as well, similar to what was done with Agents
Stack is removed
Reference : https://github.com/h2oai/h2ogpt/issues/1871