issues
search
google
/
jetstream-pytorch
PyTorch/XLA integration with JetStream (https://github.com/google/JetStream) for LLM inference"
Apache License 2.0
21
stars
12
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
Update run_server.py. metrics_server_config is not supported in JetStream[8128c8a] yet
#102
wang2yn84
closed
1 month ago
2
Add support for Llama3-70b
#101
bhavya01
closed
1 month ago
3
Fix ray conflict changes
#100
FanhaiLu1
closed
1 month ago
2
Pass metrics client config through to Jetstream
#99
Bslabe123
closed
1 month ago
1
Fix gemma model, enable_weight_quantization is available through quant_config.
#98
wang2yn84
closed
1 month ago
1
Update README.md, the quantize flag is no longer available, quantize_type assumes the role of the original flag.
#97
wang2yn84
closed
1 month ago
1
Fix flax and ray dependencies
#96
FanhaiLu1
closed
1 month ago
0
Fixes tests. Can now run on CPU by default.
#95
wang2yn84
closed
1 month ago
4
Add regression test to detect service broken and performance degradation
#94
FanhaiLu1
opened
1 month ago
0
Integrates ragged attention to JetStream Pytorch
#93
wang2yn84
closed
1 month ago
0
Move flags in scripts to a common function
#92
lsy323
closed
2 months ago
0
Update README.md
#91
qihqi
closed
2 months ago
0
Leverage tokens_utils to process result tokens
#90
FanhaiLu1
closed
2 months ago
0
Move deps to git submodule
#89
qihqi
closed
2 months ago
0
Update version of jetstream; misc fixes
#88
qihqi
closed
2 months ago
0
Update README.md
#87
JackCaoG
closed
2 months ago
3
Fix sharding config file name bug
#86
FanhaiLu1
closed
2 months ago
0
Add Gemma 2b benchmark; fix a typo.
#85
qihqi
closed
2 months ago
0
Enable Blockwise Int4 quantized linear layer
#84
lsy323
closed
1 month ago
1
Clean up flags
#83
qihqi
closed
2 months ago
0
How to run benchmark on CloudTPU v4-8
#82
JackCaoG
closed
2 months ago
7
Add Gemma 2b benchmark; fix a typo.
#81
qihqi
closed
2 months ago
0
Add shard on batch mode. Als update version of torchxla2
#80
qihqi
closed
2 months ago
0
Add llama-3 instructions to readme
#79
bhavya01
closed
2 months ago
0
Fix attention kernel of GQA use case
#78
lsy323
closed
2 months ago
0
Enable quantization for Gemma 7b
#77
qihqi
closed
2 months ago
0
Add benchmark results
#76
qihqi
closed
2 months ago
0
Enable Gemma 2B
#75
qihqi
closed
2 months ago
0
Add gemma and update recent changes to multiple host
#74
FanhaiLu1
closed
2 months ago
0
Fix sharding config for quant
#73
lsy323
closed
2 months ago
0
Use GemmaAttention for Gemma
#72
qihqi
closed
2 months ago
0
Support converting hf gemma weights
#71
lsy323
closed
2 months ago
6
Gemma sharding and test
#70
FanhaiLu1
closed
2 months ago
0
Add gemma support
#69
qihqi
closed
2 months ago
0
Add prefill only benchmark for different token length
#68
FanhaiLu1
closed
2 months ago
0
Pick a slot from 0 to batch_size-1 during run_interactive.py
#67
bhavya01
closed
2 months ago
1
Add vscode to gitignore
#66
FanhaiLu1
closed
2 months ago
0
Refactor so that environment and engine
#65
qihqi
closed
2 months ago
1
Support llama3
#64
bhavya01
closed
2 months ago
6
Add ray multiple host support
#63
FanhaiLu1
closed
2 months ago
4
Ignore duplicated lines check now
#62
FanhaiLu1
closed
2 months ago
0
Output all token text
#61
FanhaiLu1
closed
2 months ago
0
Comment broken function to unblock interactive run after JetStream api change
#60
FanhaiLu1
closed
2 months ago
0
Use property instead of function
#59
FanhaiLu1
closed
2 months ago
0
Enable pylint check
#58
FanhaiLu1
closed
2 months ago
0
Fix run_server lint check
#57
FanhaiLu1
closed
2 months ago
0
Fix run_interactive lint check
#56
FanhaiLu1
closed
2 months ago
0
Fix convert checkpoint lint check
#55
FanhaiLu1
closed
2 months ago
0
Add pylintrc and init in root dir
#54
FanhaiLu1
closed
2 months ago
0
Fix quantizaiton, jax lint and llama e2e
#53
FanhaiLu1
closed
2 months ago
0
Previous
Next