google jetstream-pytorch issues

google / jetstream-pytorch

PyTorch/XLA integration with JetStream (https://github.com/google/JetStream) for LLM inference"

Apache License 2.0

19 stars 12 forks source link

issues

Newest

Newest Most commented Recently updated Oldest Least commented Least recently updated

Add server tests

#142 bvrockwell opened 3 days ago
0
Update benchmark command in README.md

#141 bhavya01 closed 4 days ago
0
add enable jax profiler to run_server

#140 bvrockwell closed 6 days ago
0
Update README.md to state the limitation of accessing GCS when conver…

#139 wang2yn84 closed 1 week ago
0
Minor fixes to README

#138 wang2yn84 closed 1 week ago
0
Empty response returned for prompt responses when using run_server_with_ray.py and batch_size > 1

#137 richardsliu opened 1 week ago
1
Add layer id in scope for each TransformerBlock layer

#136 FanhaiLu1 closed 1 week ago
0
Checkpoint conversion script breaks for meta-llama/llama-2-7b on HF

#135 vivianrwu opened 1 week ago
0
prototyping better UX

#134 qihqi opened 2 weeks ago
0
Add left aligned cache support.

#133 wang2yn84 closed 1 week ago
0
fix mixtral quantization scaler axis when dimension > 2

#132 sixiang-google closed 2 weeks ago
0
Add test for Mixtral model.

#131 wang2yn84 closed 2 weeks ago
0
make sure GPU works

#130 qihqi closed 2 weeks ago
0
Update README.md

#129 bhavya01 closed 2 weeks ago
0
Update README.md

#128 qihqi closed 3 weeks ago
0
Update submodules, prepare for leasing v0.2.4

#127 qihqi closed 3 weeks ago
1
Add lock in prefill and generate to prevent starvation

#126 FanhaiLu1 closed 3 weeks ago
1
Update summary.md

#125 qihqi closed 2 weeks ago
1
Remove JSON config mangling for Gemma ckpt

#124 lsy323 closed 3 weeks ago
1
Add different token sampling algorithms to decoder.

#123 bvrockwell closed 3 weeks ago
1
add script to isntall for GPU

#122 qihqi closed 3 weeks ago
2
Fix convert_checkpoint.py for hf and gemma

#121 qihqi closed 3 weeks ago
0
Mixtral enablement.

#120 wang2yn84 closed 3 weeks ago
1
Add guide on adding HF ckpt conversion support

#119 lsy323 closed 1 month ago
0
Support HF LLaMA ckpt conversion

#118 lsy323 closed 1 month ago
0
Integrate disaggregated serving with JetStream

#117 FanhaiLu1 closed 1 month ago
0
Fix conversion bug

#116 yeandy closed 1 month ago
0
Bug in model conversion script

#115 yeandy closed 1 month ago
2
Add for readme interleave multiple host with ray

#114 FanhaiLu1 closed 1 month ago
1
Metrics bug: server_lib should be config_lib

#113 Bslabe123 closed 1 month ago
0
Enable jax profiler server in run with ray

#112 FanhaiLu1 closed 1 month ago
0
Jetstream: 8128c8a -> v0.2.2

#111 Bslabe123 closed 1 month ago
0
Release JetStream v0.2.2

#110 JoeZijunZhou closed 1 month ago
0
Add run_server with ray for interleave serving

#109 FanhaiLu1 closed 1 month ago
0
Update Jetstream commit id

#108 FanhaiLu1 closed 1 month ago
0
Return Tuple(interleaveEngList, prefillEngineList, decodeEngineList) in create ray engine

#107 FanhaiLu1 opened 1 month ago
0
Ray Disaggregated Serving MVP

#106 FanhaiLu1 closed 1 month ago
2
Add activation quantization support to per-channel quantized linear layers

#105 lsy323 closed 3 weeks ago
0
Fix convert script cannot generate bf16 weights

#104 lsy323 closed 1 month ago
0
Update run_interactive.py with finer control of profiler.

#103 wang2yn84 closed 1 month ago
0
Update run_server.py. metrics_server_config is not supported in JetStream[8128c8a] yet

#102 wang2yn84 closed 1 month ago
2
Add support for Llama3-70b

#101 bhavya01 closed 3 weeks ago
3
Fix ray conflict changes

#100 FanhaiLu1 closed 1 month ago
2
Pass metrics client config through to Jetstream

#99 Bslabe123 closed 1 month ago
1
Fix gemma model, enable_weight_quantization is available through quant_config.

#98 wang2yn84 closed 1 month ago
1
Update README.md, the quantize flag is no longer available, quantize_type assumes the role of the original flag.

#97 wang2yn84 closed 1 month ago
1
Fix flax and ray dependencies

#96 FanhaiLu1 closed 1 month ago
0
Fixes tests. Can now run on CPU by default.

#95 wang2yn84 closed 1 month ago
4
Add regression test to detect service broken and performance degradation

#94 FanhaiLu1 opened 1 month ago
0
Integrates ragged attention to JetStream Pytorch

#93 wang2yn84 closed 1 month ago
0