google / JetStream

JetStream is a throughput and memory optimized engine for LLM inference on XLA devices, starting with TPUs (and GPUs in future -- PRs welcome).
Apache License 2.0
194 stars 24 forks source link

Add http server to JetStream #115

Closed JoeZijunZhou closed 1 month ago

JoeZijunZhou commented 2 months ago