issues
search
hltcoe
/
sandle
Run a large language modeling SANDbox in your Local Environment
Other
7
stars
1
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
WebSockets
#107
ccmaymay
opened
1 year ago
0
NeMo
#106
ccmaymay
closed
1 year ago
1
TorchInductor
#105
ccmaymay
closed
1 year ago
1
HF inference endpoints
#104
ccmaymay
closed
1 year ago
1
langchain
#103
ccmaymay
opened
1 year ago
0
vLLM
#102
ccmaymay
opened
1 year ago
0
Bright cluster
#101
ccmaymay
closed
1 year ago
1
Run.ai
#100
ccmaymay
closed
1 year ago
1
Vicuna
#99
ccmaymay
opened
1 year ago
0
fast transformers
#98
ccmaymay
closed
1 year ago
1
Organize & share software stack notes
#97
ccmaymay
opened
1 year ago
0
FastChat
#96
ccmaymay
opened
1 year ago
0
petals chat UI
#95
ccmaymay
opened
1 year ago
0
HF chat UI
#94
ccmaymay
opened
1 year ago
0
Comparison to Petals
#93
danyaljj
opened
1 year ago
0
Adding, removing backends/backend nodes at runtime
#92
ccmaymay
opened
1 year ago
0
Prompt tuning endpoint
#91
ccmaymay
opened
1 year ago
0
Embeddings endpoint
#90
ccmaymay
opened
1 year ago
0
XGLM
#89
ccmaymay
opened
1 year ago
0
DeepSpeed
#88
ccmaymay
opened
1 year ago
3
LLaMA backend timeout
#87
ccmaymay
closed
1 year ago
0
LLaMA backend OOM after idling for a while, then trying to allocate >1 EB
#86
ccmaymay
closed
1 year ago
2
Out of memory in some settings even when there should be plenty
#85
ccmaymay
opened
1 year ago
0
Click
#84
ccmaymay
closed
1 year ago
1
Torch Serve
#83
ccmaymay
closed
1 year ago
2
HF Text Generation Inference
#82
ccmaymay
opened
1 year ago
0
Ray Serve
#81
ccmaymay
closed
1 year ago
1
Help Aleem get started
#80
ccmaymay
opened
1 year ago
0
Triton Inference Server / FasterTransformer
#79
ccmaymay
closed
1 year ago
1
Ease configuration
#78
ccmaymay
closed
1 year ago
1
Add llama support
#77
ccmaymay
closed
1 year ago
1
CUDA error: peer mapping resources exhausted
#76
ccmaymay
opened
1 year ago
0
Stop generation when stop sequence is generated.
#75
ccmaymay
closed
2 years ago
0
Fix tokenized contraction input-output mismatch.
#74
ccmaymay
closed
2 years ago
0
Stop generation at stop sequence instead of truncating after the fact
#73
ccmaymay
closed
2 years ago
0
Debug sporadic CI failures
#72
ccmaymay
closed
2 years ago
1
Remove github banner.
#71
ccmaymay
closed
2 years ago
0
Optionally use bnb-int8 algorithm.
#70
ccmaymay
closed
2 years ago
0
Add GPT-NeoX
#69
ccmaymay
opened
2 years ago
0
nginx returns html on gateway timeout
#68
ccmaymay
closed
2 years ago
0
Add GPT-J
#67
ccmaymay
closed
2 years ago
0
FastAPI
#66
ccmaymay
opened
2 years ago
0
Energon AI OPT server
#65
ccmaymay
closed
1 year ago
1
Alpa OPT service
#64
ccmaymay
closed
1 year ago
1
Defer to HF Hub for model list.
#63
ccmaymay
closed
2 years ago
0
Defer to HF Hub for checking if model is supported
#62
ccmaymay
closed
2 years ago
0
Handle prompt-too-long error
#61
ccmaymay
opened
2 years ago
0
Explain authorized-users.txt in README.
#60
ccmaymay
closed
2 years ago
0
Input prompt does not match output prompt when input contractions are already tokenized (ex. `mother 's`)
#59
ccmaymay
closed
2 years ago
2
Investigate poor performance on large prompts
#58
ccmaymay
opened
2 years ago
0
Next