issues
search
vectorch-ai
/
ScaleLLM
A high-performance inference system for large language models, designed for production environments.
https://docs.vectorch.com/
Apache License 2.0
315
stars
23
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
build: fix multiple definition issue.
#260
liutongxuan
closed
6 hours ago
1
bugfix: fix invalid max_cache_size when device is cpu.
#259
liutongxuan
closed
16 hours ago
0
pytest core dump in workflow
#258
guocuimi
opened
1 day ago
0
fix: check against num_tokens instead of num_prompt_tokens for shared blocks
#257
guocuimi
closed
1 day ago
0
build: fix multiple definition issue
#256
guocuimi
closed
2 days ago
0
dev: added cuda 12.4 build support
#255
guocuimi
closed
3 days ago
0
dev: fix issues in run_in_docker script
#254
guocuimi
closed
4 days ago
0
alllow deploy docs when triggered on demand
#253
guocuimi
closed
1 week ago
0
fix: pass in secrets for workflow calls.
#252
guocuimi
closed
1 week ago
0
fix workflow
#251
guocuimi
closed
1 week ago
0
ci: added release workflow
#250
guocuimi
closed
1 week ago
0
revert torch.cuda.empty_cache change
#249
guocuimi
closed
1 week ago
0
fix multiple devices cuda graph capture issue
#248
guocuimi
closed
1 week ago
0
refactor: only do sampling in driver worker (rank=0)
#247
guocuimi
closed
1 week ago
0
[wip] feat: add embeddings support
#246
guocuimi
opened
1 week ago
0
[minor] use available memory to caculate cache_size by default.
#245
liutongxuan
closed
1 week ago
1
feat: added unittests for openai server
#244
guocuimi
closed
2 weeks ago
0
feat: added include_usage into stream options for stream scenarios
#243
guocuimi
closed
2 weeks ago
0
feat: added '__repr__' function for scalellm package
#242
guocuimi
closed
2 weeks ago
0
feat: added synchronization for batch inference
#241
guocuimi
closed
2 weeks ago
0
feat: added logprobs support for speculative decoding
#240
guocuimi
closed
2 weeks ago
0
refactor: split pybind11 binding definitions into seperate files
#239
guocuimi
closed
3 weeks ago
0
feat: added id_to_token for tokenizer to handle unfinished byte sequence, ending with "�"
#238
guocuimi
closed
3 weeks ago
0
feat: added token_ids into sequence output for better debuggability.
#237
guocuimi
closed
3 weeks ago
0
feat: added best_of functionality for completion apis
#236
guocuimi
closed
3 weeks ago
0
[wip] feat: added logprobs support for speculative decoding
#235
guocuimi
closed
2 weeks ago
0
feat: added logprobs for grpc server
#234
guocuimi
closed
3 weeks ago
0
feat: added logprobs support for legacy completion api
#233
guocuimi
closed
3 weeks ago
0
feat: added openai compatible logprobs support
#232
guocuimi
closed
3 weeks ago
0
feat: added with statement support to release memory and exposed help function for tokenizer
#231
guocuimi
closed
3 weeks ago
0
fix: load vocab_size first then use it to decide model type for model sharing between llama3, llama2 and Yi.
#230
guocuimi
closed
3 weeks ago
0
fix: decode ending tokens one by one to handle unfinished tokens
#229
guocuimi
closed
3 weeks ago
0
fix: avoid tensor convertion for converted ones.
#228
guocuimi
closed
3 weeks ago
0
feat: added time_to_first_token and inter_token metrics for both stream and non-stream requests
#227
guocuimi
closed
3 weeks ago
0
fix: use error instead of CHECK when prompt input is empty
#226
guocuimi
closed
3 weeks ago
0
docs: add livehtml for docs development
#225
guocuimi
closed
3 weeks ago
0
feat: convert pickle to safetensors for fast loading
#224
guocuimi
closed
3 weeks ago
0
fix: set correct default value of rope_theta for llama2
#223
guocuimi
closed
4 weeks ago
1
[Correctness] Output incorrect on the baichuan2 model using scalellm.
#222
liutongxuan
closed
3 weeks ago
1
[Core] core on the chatglm3 model using scalellm.
#221
liutongxuan
closed
3 weeks ago
1
[Correctness] Using llama-2-7b-hf, scalellm's output is different with vllm's output.
#220
liutongxuan
closed
3 weeks ago
0
added missing changes for carrying over prompt
#219
guocuimi
closed
4 weeks ago
0
feat: carry over prompt to output for feature parity
#218
guocuimi
closed
4 weeks ago
0
refactor: move setup.py to top level
#217
guocuimi
closed
4 weeks ago
0
[feat] add prompt in RequestOutput.
#216
liutongxuan
closed
4 weeks ago
1
install cpython shared lib in manylinux docker image
#215
guocuimi
closed
3 weeks ago
0
fix: use a consistent version for whl
#214
guocuimi
closed
4 weeks ago
0
fix: fix weight load issue for fused qkv and added more unittests for weight loading
#213
guocuimi
closed
4 weeks ago
0
pip install scalellm failure.
#212
liutongxuan
closed
3 weeks ago
1
feat: added token related latency metrics
#211
guocuimi
closed
1 month ago
0
Next