issues
search
huggingface
/
transformers-bloom-inference
Fast Inference Solutions for BLOOM
Apache License 2.0
560
stars
114
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
Update Makefile
#102
tsingmao
closed
1 year ago
0
ValueError: Couldn't instantiate the backend tokenizer from one of:
#101
SeekPoint
opened
1 year ago
0
It does not work with Falcon-40B correctly
#100
AGrosserHH
opened
1 year ago
0
How to understand this note: "note: Since Deepspeed-ZeRO can process multiple generate streams in parallel its throughput can be further divided by 8 or 16 ..."
#99
HuipengXu
opened
1 year ago
1
does this work for llama 65B
#98
GradientGuru
closed
1 year ago
1
When deploying the Bloom model, I noticed that the POST method is used for the generation task. Is it possible to modify it to perform question-answering instead?
#97
dizhenx
opened
1 year ago
0
The Makefile execution was successful, but there is no response when entering text.
#96
dizhenx
opened
1 year ago
0
AttributeError: 'BloomForCausalLM' object has no attribute 'module'
#95
detectiveJoshua
opened
1 year ago
0
Are there fine-tuning and inference scripts available for int4 quantization in bloom-7b? Is it possible to limit the GPU memory usage to within 10GB?
#94
dizhenx
opened
1 year ago
1
Can I combine fastertransformer to make it faster
#93
xsj4cs
closed
1 year ago
1
ds_inference success but OOM when use tp_presharded_mode=True
#92
LiuShixing
closed
1 year ago
2
`accelerate` in `bloom-inference-scripts`?
#91
jeromeku
closed
1 year ago
1
Inference(chatbot) does not work as expected on 2 gpus with bigscience/bloom-7b1 model
#90
dantalyon
opened
1 year ago
2
Bloom176B RuntimeError: expected scalar type Half but found BFloat16
#89
wohenniubi
closed
1 year ago
3
pip install command does not work as expected
#88
Billijk
opened
1 year ago
2
question regarding the float16 and bfloat
#87
allanj
closed
1 year ago
1
why no use deepspeed.init_inference in zero benchmark
#86
tingshua-yts
closed
1 year ago
2
Unable to reload a quantized model
#85
moonlightian
closed
1 year ago
4
Big batchsize cause OOM in bloom-ds-inference.py, how to adjust max_split_size_mb value
#84
tohneecao
opened
1 year ago
1
The details of hf-accelerate pp.
#83
tohneecao
closed
1 year ago
2
root_dir in TemporaryCheckpointsJSON is redundant
#82
dc3671
opened
1 year ago
0
BUILD ERROR with nvcc
#81
tohneecao
closed
1 year ago
3
Can not generate text correctly after loading an int8 model
#80
moonlightian
closed
1 year ago
4
Why does ds-inference int8 run slower than ds-inference fp16?
#79
DominickZhang
closed
1 year ago
3
tokenizer.json 乱码怎么解析
#78
hongshengxin
closed
1 year ago
1
[Bug] Int8 quantize inference failed using bloom-inference-scripts/bloom-ds-inference.py with deepspeed==0.9.0 on multi-gpus
#77
hanrui1sensetime
opened
1 year ago
1
cuBLAS error with NVIDIA H100 HGX, CUDA v12.1, and cuDNN 8.8.1
#76
BenFauber
opened
1 year ago
0
concurrent requests
#75
ustclan
closed
1 year ago
2
Is there a way to initialize a random weight for
#74
PannenetsF
closed
1 year ago
3
beam search
#73
syp1997
closed
1 year ago
2
Incorrectly benchmarking
#72
JoeyTPChou
opened
1 year ago
0
fix checkpoints file list to align with DeepSpeed
#71
dc3671
opened
1 year ago
1
Short response for bloom inferring
#70
raihan0824
closed
1 year ago
3
Should I use bf16 or fp16?
#69
richarddwang
closed
1 year ago
3
"bloom-ds-zero-inference.py" works but "inference_server.cli --deployment_framework ds_zero" fails
#68
richarddwang
closed
5 months ago
4
♻️ cleaning some stuff
#67
mayank31398
closed
3 months ago
0
Cannot explain recurring OOM error
#66
Remorax
opened
1 year ago
6
The generated results are different when using greedy search during generation
#65
FrostML
opened
1 year ago
4
how to run the server?
#64
raihan0824
closed
1 year ago
2
stuck when inferring
#63
raihan0824
opened
1 year ago
1
RuntimeError: This event loop is already running
#62
syp1997
opened
1 year ago
2
Distributed Training using the same loading method
#61
ananda1996ai
closed
1 year ago
2
OOM of CUDA when using one GPU
#60
xiongjun19
closed
1 year ago
4
Why is the throughput of DS-inference doubled when using 4 A100 GPUs compared to 8 A100 GPUs
#59
DominickZhang
closed
1 year ago
3
add color for generated text
#58
mayank31398
closed
1 year ago
0
Adding minor changes to improve server deployment experience
#57
joe32140
closed
1 year ago
2
drop duplicate html files
#56
mayank31398
closed
1 year ago
0
Max tokens generated remains constant for whatever the input token size
#55
vamsikrishnav
closed
1 year ago
2
fix model path for int8
#54
mayank31398
closed
1 year ago
1
fix runtime error with one gpu
#53
StoyanStAtanasov
closed
1 year ago
7
Next