huggingface transformers-bloom-inference issues

huggingface / transformers-bloom-inference

Fast Inference Solutions for BLOOM

Apache License 2.0

560 stars 114 forks source link

issues

Newest

Newest Most commented Recently updated Oldest Least commented Least recently updated

Update Makefile

#102 tsingmao closed 1 year ago
0
ValueError: Couldn't instantiate the backend tokenizer from one of:

#101 SeekPoint opened 1 year ago
0
It does not work with Falcon-40B correctly

#100 AGrosserHH opened 1 year ago
0
How to understand this note: "note: Since Deepspeed-ZeRO can process multiple generate streams in parallel its throughput can be further divided by 8 or 16 ..."

#99 HuipengXu opened 1 year ago
1
does this work for llama 65B

#98 GradientGuru closed 1 year ago
1
When deploying the Bloom model, I noticed that the POST method is used for the generation task. Is it possible to modify it to perform question-answering instead?

#97 dizhenx opened 1 year ago
0
The Makefile execution was successful, but there is no response when entering text.

#96 dizhenx opened 1 year ago
0
AttributeError: 'BloomForCausalLM' object has no attribute 'module'

#95 detectiveJoshua opened 1 year ago
0
Are there fine-tuning and inference scripts available for int4 quantization in bloom-7b? Is it possible to limit the GPU memory usage to within 10GB?

#94 dizhenx opened 1 year ago
1
Can I combine fastertransformer to make it faster

#93 xsj4cs closed 1 year ago
1
ds_inference success but OOM when use tp_presharded_mode=True

#92 LiuShixing closed 1 year ago
2
`accelerate` in `bloom-inference-scripts`?

#91 jeromeku closed 1 year ago
1
Inference(chatbot) does not work as expected on 2 gpus with bigscience/bloom-7b1 model

#90 dantalyon opened 1 year ago
2
Bloom176B RuntimeError: expected scalar type Half but found BFloat16

#89 wohenniubi closed 1 year ago
3
pip install command does not work as expected

#88 Billijk opened 1 year ago
2
question regarding the float16 and bfloat

#87 allanj closed 1 year ago
1
why no use deepspeed.init_inference in zero benchmark

#86 tingshua-yts closed 1 year ago
2
Unable to reload a quantized model

#85 moonlightian closed 1 year ago
4
Big batchsize cause OOM in bloom-ds-inference.py, how to adjust max_split_size_mb value

#84 tohneecao opened 1 year ago
1
The details of hf-accelerate pp.

#83 tohneecao closed 1 year ago
2
root_dir in TemporaryCheckpointsJSON is redundant

#82 dc3671 opened 1 year ago
0
BUILD ERROR with nvcc

#81 tohneecao closed 1 year ago
3
Can not generate text correctly after loading an int8 model

#80 moonlightian closed 1 year ago
4
Why does ds-inference int8 run slower than ds-inference fp16?

#79 DominickZhang closed 1 year ago
3
tokenizer.json 乱码怎么解析

#78 hongshengxin closed 1 year ago
1
[Bug] Int8 quantize inference failed using bloom-inference-scripts/bloom-ds-inference.py with deepspeed==0.9.0 on multi-gpus

#77 hanrui1sensetime opened 1 year ago
1
cuBLAS error with NVIDIA H100 HGX, CUDA v12.1, and cuDNN 8.8.1

#76 BenFauber opened 1 year ago
0
concurrent requests

#75 ustclan closed 1 year ago
2
Is there a way to initialize a random weight for

#74 PannenetsF closed 1 year ago
3
beam search

#73 syp1997 closed 1 year ago
2
Incorrectly benchmarking

#72 JoeyTPChou opened 1 year ago
0
fix checkpoints file list to align with DeepSpeed

#71 dc3671 opened 1 year ago
1
Short response for bloom inferring

#70 raihan0824 closed 1 year ago
3
Should I use bf16 or fp16?

#69 richarddwang closed 1 year ago
3
"bloom-ds-zero-inference.py" works but "inference_server.cli --deployment_framework ds_zero" fails

#68 richarddwang closed 5 months ago
4
♻️ cleaning some stuff

#67 mayank31398 closed 3 months ago
0
Cannot explain recurring OOM error

#66 Remorax opened 1 year ago
6
The generated results are different when using greedy search during generation

#65 FrostML opened 1 year ago
4
how to run the server?

#64 raihan0824 closed 1 year ago
2
stuck when inferring

#63 raihan0824 opened 1 year ago
1
RuntimeError: This event loop is already running

#62 syp1997 opened 1 year ago
2
Distributed Training using the same loading method

#61 ananda1996ai closed 1 year ago
2
OOM of CUDA when using one GPU

#60 xiongjun19 closed 1 year ago
4
Why is the throughput of DS-inference doubled when using 4 A100 GPUs compared to 8 A100 GPUs

#59 DominickZhang closed 1 year ago
3
add color for generated text

#58 mayank31398 closed 1 year ago
0
Adding minor changes to improve server deployment experience

#57 joe32140 closed 1 year ago
2
drop duplicate html files

#56 mayank31398 closed 1 year ago
0
Max tokens generated remains constant for whatever the input token size

#55 vamsikrishnav closed 1 year ago
2
fix model path for int8

#54 mayank31398 closed 1 year ago
1
fix runtime error with one gpu

#53 StoyanStAtanasov closed 1 year ago
7