issues
search
vectorch-ai
/
ScaleLLM
A high-performance inference system for large language models, designed for production environments.
https://docs.vectorch.com/
Apache License 2.0
377
stars
28
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
refactor: split pybind11 binding definitions into seperate files
#239
guocuimi
closed
3 months ago
0
feat: added id_to_token for tokenizer to handle unfinished byte sequence, ending with "�"
#238
guocuimi
closed
3 months ago
0
feat: added token_ids into sequence output for better debuggability.
#237
guocuimi
closed
3 months ago
0
feat: added best_of functionality for completion apis
#236
guocuimi
closed
3 months ago
0
[wip] feat: added logprobs support for speculative decoding
#235
guocuimi
closed
3 months ago
0
feat: added logprobs for grpc server
#234
guocuimi
closed
3 months ago
0
feat: added logprobs support for legacy completion api
#233
guocuimi
closed
3 months ago
0
feat: added openai compatible logprobs support
#232
guocuimi
closed
3 months ago
0
feat: added with statement support to release memory and exposed help function for tokenizer
#231
guocuimi
closed
3 months ago
0
fix: load vocab_size first then use it to decide model type for model sharing between llama3, llama2 and Yi.
#230
guocuimi
closed
3 months ago
0
fix: decode ending tokens one by one to handle unfinished tokens
#229
guocuimi
closed
4 months ago
0
fix: avoid tensor convertion for converted ones.
#228
guocuimi
closed
4 months ago
0
feat: added time_to_first_token and inter_token metrics for both stream and non-stream requests
#227
guocuimi
closed
4 months ago
0
fix: use error instead of CHECK when prompt input is empty
#226
guocuimi
closed
4 months ago
0
docs: add livehtml for docs development
#225
guocuimi
closed
4 months ago
0
feat: convert pickle to safetensors for fast loading
#224
guocuimi
closed
4 months ago
0
fix: set correct default value of rope_theta for llama2
#223
guocuimi
closed
4 months ago
1
[Correctness] Output incorrect on the baichuan2 model using scalellm.
#222
liutongxuan
closed
4 months ago
1
[Core] core on the chatglm3 model using scalellm.
#221
liutongxuan
closed
4 months ago
1
[Correctness] Using llama-2-7b-hf, scalellm's output is different with vllm's output.
#220
liutongxuan
closed
4 months ago
0
added missing changes for carrying over prompt
#219
guocuimi
closed
4 months ago
0
feat: carry over prompt to output for feature parity
#218
guocuimi
closed
4 months ago
0
refactor: move setup.py to top level
#217
guocuimi
closed
4 months ago
0
[feat] add prompt in RequestOutput.
#216
liutongxuan
closed
4 months ago
1
install cpython shared lib in manylinux docker image
#215
guocuimi
closed
3 months ago
0
fix: use a consistent version for whl
#214
guocuimi
closed
4 months ago
0
fix: fix weight load issue for fused qkv and added more unittests for weight loading
#213
guocuimi
closed
4 months ago
0
pip install scalellm failure.
#212
liutongxuan
closed
3 months ago
1
feat: added token related latency metrics
#211
guocuimi
closed
4 months ago
0
feat: Added prometheus metrics
#210
guocuimi
closed
4 months ago
0
feat: added monitoring docker compose for prometheus and grafana
#209
guocuimi
closed
4 months ago
0
docs: fixed source directory and added announcement
#208
guocuimi
closed
4 months ago
0
docs: added docs skeleton
#207
guocuimi
closed
4 months ago
0
ci: added workflow to publish docs to GitHub Pages
#206
guocuimi
closed
4 months ago
0
ci: publish wheels to whl index repo
#205
guocuimi
closed
4 months ago
0
feat: added batch support for llm handler
#204
guocuimi
closed
4 months ago
0
[wip] feat: added benchmarks for scalellm package
#203
guocuimi
closed
3 months ago
0
fix: use a proper epsilon to avoid division by zero error for rejection sampler
#202
guocuimi
closed
4 months ago
0
feat: added multiple threads support for LLMHandler
#201
guocuimi
closed
4 months ago
0
feat: moved scheduler wait logic from python into scheduler run_until_complete function
#200
guocuimi
closed
4 months ago
0
[python] added more examples and fix requirments version
#199
guocuimi
closed
4 months ago
0
ci: bump version and build with new manylinux image (gcc-9)
#198
guocuimi
closed
4 months ago
0
fix: make build pass with gcc-9
#197
guocuimi
closed
4 months ago
0
[CI] fix docker run options
#196
guocuimi
closed
4 months ago
0
[fix] fix workflow format
#195
guocuimi
closed
4 months ago
0
[feat] added cuda 11.8 devel image to build cpp release image
#194
guocuimi
closed
4 months ago
0
[Release] added workflow to publish whls to PyPI
#193
guocuimi
closed
4 months ago
0
[Release] prepare 0.1.0 release
#192
guocuimi
closed
4 months ago
0
[python] added requirements into package
#191
guocuimi
closed
4 months ago
0
[python] added LLM for offline inference and stream examples for chat and complete
#190
guocuimi
closed
4 months ago
0
Previous
Next