JetStream is a throughput and memory optimized engine for LLM inference on XLA devices, starting with TPUs (and GPUs in future -- PRs welcome).
194
stars
24
forks
source link
del prefill_result & update dev image #116
Closed
morgandu closed 2 months ago