issues
search
vectorch-ai
/
ScaleLLM
A high-performance inference system for large language models, designed for production environments.
https://docs.vectorch.com/
Apache License 2.0
316
stars
23
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
fix: fix weight load issue for fused qkv and added more unittests for weight loading
#213
guocuimi
closed
1 month ago
0
pip install scalellm failure.
#212
liutongxuan
closed
3 weeks ago
1
feat: added token related latency metrics
#211
guocuimi
closed
1 month ago
0
feat: Added prometheus metrics
#210
guocuimi
closed
1 month ago
0
feat: added monitoring docker compose for prometheus and grafana
#209
guocuimi
closed
1 month ago
0
docs: fixed source directory and added announcement
#208
guocuimi
closed
1 month ago
0
docs: added docs skeleton
#207
guocuimi
closed
1 month ago
0
ci: added workflow to publish docs to GitHub Pages
#206
guocuimi
closed
1 month ago
0
ci: publish wheels to whl index repo
#205
guocuimi
closed
1 month ago
0
feat: added batch support for llm handler
#204
guocuimi
closed
1 month ago
0
[wip] feat: added benchmarks for scalellm package
#203
guocuimi
closed
3 weeks ago
0
fix: use a proper epsilon to avoid division by zero error for rejection sampler
#202
guocuimi
closed
1 month ago
0
feat: added multiple threads support for LLMHandler
#201
guocuimi
closed
1 month ago
0
feat: moved scheduler wait logic from python into scheduler run_until_complete function
#200
guocuimi
closed
1 month ago
0
[python] added more examples and fix requirments version
#199
guocuimi
closed
1 month ago
0
ci: bump version and build with new manylinux image (gcc-9)
#198
guocuimi
closed
1 month ago
0
fix: make build pass with gcc-9
#197
guocuimi
closed
1 month ago
0
[CI] fix docker run options
#196
guocuimi
closed
1 month ago
0
[fix] fix workflow format
#195
guocuimi
closed
1 month ago
0
[feat] added cuda 11.8 devel image to build cpp release image
#194
guocuimi
closed
1 month ago
0
[Release] added workflow to publish whls to PyPI
#193
guocuimi
closed
1 month ago
0
[Release] prepare 0.1.0 release
#192
guocuimi
closed
1 month ago
0
[python] added requirements into package
#191
guocuimi
closed
1 month ago
0
[python] added LLM for offline inference and stream examples for chat and complete
#190
guocuimi
closed
1 month ago
0
[fix] fix extension typo for wheel publish workflow
#189
guocuimi
closed
1 month ago
0
[CI] Upload wheels to release as asserts
#188
guocuimi
closed
1 month ago
0
[feat] added version suffix to include cuda and torch version
#187
guocuimi
closed
1 month ago
0
[fix] added cuda 11.8 support for manylinux
#186
guocuimi
closed
1 month ago
0
[fix] added manylinux support
#185
guocuimi
closed
1 month ago
0
[CI] fix docker image issues and build wheel for different python, pytorch versions
#184
guocuimi
closed
1 month ago
0
[ci] build python wheels
#183
guocuimi
closed
1 month ago
0
[CI] added base docker image for python wheel build
#182
guocuimi
closed
1 month ago
0
[kernle] change head_dim list to reduce binary size
#181
guocuimi
closed
1 month ago
0
[misc] some changes to cmake file
#180
guocuimi
closed
1 month ago
0
[python] reduce whl size
#179
guocuimi
closed
1 month ago
0
[model] support vision language model llava.
#178
liutongxuan
closed
1 week ago
1
[feat] added status handling for grpc server
#177
guocuimi
closed
1 month ago
0
[python] added model check for rest api
#176
guocuimi
closed
1 month ago
0
[python] move request handling logic into seperate file from api server
#175
guocuimi
closed
1 month ago
0
[refactor] consolidate handlers to share llm_handler between python rest api server and grpc server
#174
guocuimi
closed
1 month ago
0
[refactor] move proto definitions into proto namespace
#173
guocuimi
closed
1 month ago
0
[feat] implement async llm engine for python wrapper
#172
guocuimi
closed
1 month ago
0
[feat] added python LLMEngine skeleton
#171
guocuimi
closed
2 months ago
0
[refactor] combine sequence and request outputs
#170
guocuimi
closed
2 months ago
0
[feat] added python rest api server skeleton
#169
guocuimi
closed
2 months ago
0
[misc] upgrade torch to 2.3 and use gcc-12
#168
guocuimi
closed
2 months ago
0
[fix] use the pybind11 from libtorch and fix model download issue.
#167
guocuimi
closed
2 months ago
0
LoRA: QLoRA/S-LoRA: Serving thousands of LoRA adapters
#166
guocuimi
opened
2 months ago
0
Introducing the Mamba model
#165
guocuimi
opened
2 months ago
0
Introducing a ring attention mechanism for handling long contexts
#164
guocuimi
opened
2 months ago
0
Previous
Next