vectorch-ai ScaleLLM issues

vectorch-ai / ScaleLLM

A high-performance inference system for large language models, designed for production environments.

https://docs.vectorch.com/

Apache License 2.0

377 stars 28 forks source link

issues

Newest

Newest Most commented Recently updated Oldest Least commented Least recently updated

refactor: split pybind11 binding definitions into seperate files

#239 guocuimi closed 3 months ago
0
feat: added id_to_token for tokenizer to handle unfinished byte sequence, ending with "�"

#238 guocuimi closed 3 months ago
0
feat: added token_ids into sequence output for better debuggability.

#237 guocuimi closed 3 months ago
0
feat: added best_of functionality for completion apis

#236 guocuimi closed 3 months ago
0
[wip] feat: added logprobs support for speculative decoding

#235 guocuimi closed 3 months ago
0
feat: added logprobs for grpc server

#234 guocuimi closed 3 months ago
0
feat: added logprobs support for legacy completion api

#233 guocuimi closed 3 months ago
0
feat: added openai compatible logprobs support

#232 guocuimi closed 3 months ago
0
feat: added with statement support to release memory and exposed help function for tokenizer

#231 guocuimi closed 3 months ago
0
fix: load vocab_size first then use it to decide model type for model sharing between llama3, llama2 and Yi.

#230 guocuimi closed 3 months ago
0
fix: decode ending tokens one by one to handle unfinished tokens

#229 guocuimi closed 4 months ago
0
fix: avoid tensor convertion for converted ones.

#228 guocuimi closed 4 months ago
0
feat: added time_to_first_token and inter_token metrics for both stream and non-stream requests

#227 guocuimi closed 4 months ago
0
fix: use error instead of CHECK when prompt input is empty

#226 guocuimi closed 4 months ago
0
docs: add livehtml for docs development

#225 guocuimi closed 4 months ago
0
feat: convert pickle to safetensors for fast loading

#224 guocuimi closed 4 months ago
0
fix: set correct default value of rope_theta for llama2

#223 guocuimi closed 4 months ago
1
[Correctness] Output incorrect on the baichuan2 model using scalellm.

#222 liutongxuan closed 4 months ago
1
[Core] core on the chatglm3 model using scalellm.

#221 liutongxuan closed 4 months ago
1
[Correctness] Using llama-2-7b-hf, scalellm's output is different with vllm's output.

#220 liutongxuan closed 4 months ago
0
added missing changes for carrying over prompt

#219 guocuimi closed 4 months ago
0
feat: carry over prompt to output for feature parity

#218 guocuimi closed 4 months ago
0
refactor: move setup.py to top level

#217 guocuimi closed 4 months ago
0
[feat] add prompt in RequestOutput.

#216 liutongxuan closed 4 months ago
1
install cpython shared lib in manylinux docker image

#215 guocuimi closed 3 months ago
0
fix: use a consistent version for whl

#214 guocuimi closed 4 months ago
0
fix: fix weight load issue for fused qkv and added more unittests for weight loading

#213 guocuimi closed 4 months ago
0
pip install scalellm failure.

#212 liutongxuan closed 3 months ago
1
feat: added token related latency metrics

#211 guocuimi closed 4 months ago
0
feat: Added prometheus metrics

#210 guocuimi closed 4 months ago
0
feat: added monitoring docker compose for prometheus and grafana

#209 guocuimi closed 4 months ago
0
docs: fixed source directory and added announcement

#208 guocuimi closed 4 months ago
0
docs: added docs skeleton

#207 guocuimi closed 4 months ago
0
ci: added workflow to publish docs to GitHub Pages

#206 guocuimi closed 4 months ago
0
ci: publish wheels to whl index repo

#205 guocuimi closed 4 months ago
0
feat: added batch support for llm handler

#204 guocuimi closed 4 months ago
0
[wip] feat: added benchmarks for scalellm package

#203 guocuimi closed 3 months ago
0
fix: use a proper epsilon to avoid division by zero error for rejection sampler

#202 guocuimi closed 4 months ago
0
feat: added multiple threads support for LLMHandler

#201 guocuimi closed 4 months ago
0
feat: moved scheduler wait logic from python into scheduler run_until_complete function

#200 guocuimi closed 4 months ago
0
[python] added more examples and fix requirments version

#199 guocuimi closed 4 months ago
0
ci: bump version and build with new manylinux image (gcc-9)

#198 guocuimi closed 4 months ago
0
fix: make build pass with gcc-9

#197 guocuimi closed 4 months ago
0
[CI] fix docker run options

#196 guocuimi closed 4 months ago
0
[fix] fix workflow format

#195 guocuimi closed 4 months ago
0
[feat] added cuda 11.8 devel image to build cpp release image

#194 guocuimi closed 4 months ago
0
[Release] added workflow to publish whls to PyPI

#193 guocuimi closed 4 months ago
0
[Release] prepare 0.1.0 release

#192 guocuimi closed 4 months ago
0
[python] added requirements into package

#191 guocuimi closed 4 months ago
0
[python] added LLM for offline inference and stream examples for chat and complete

#190 guocuimi closed 4 months ago
0

Previous Next