oobabooga / text-generation-webui

A Gradio web UI for Large Language Models.
GNU Affero General Public License v3.0
40.6k stars 5.31k forks source link

Only 50% of the CPU being utilized #899

Closed Freestyle00 closed 1 year ago

Freestyle00 commented 1 year ago

Describe the bug

When I use the AI it only utilizes 50% instead of 100% of my CPU

Is there an existing issue for this?

Reproduction

I have used the windows one click installer but I have modified the server.py line to the following:

call python server.py --auto-devices --chat --listen --listen-port 8008

I have later on downloaded te facebook/opt-30b model to play around with with the download-model.bat and loaded it using the start-webui.bat wrote something and saw that sadly it only uses around 50% of my cpu.

Screenshot

Starting the Request mstsc_bOs7jtA33O Letting it run for some time mstsc_nrHp3b1IAL After the request finished mstsc_yQYbLSVhvd

Logs

PS C:\Users\Administrator\Downloads\oobabooga-windows\oobabooga-windows> .\start-webui.bat
Starting the web UI...

===================================BUG REPORT===================================
Welcome to bitsandbytes. For bug reports, please submit your error trace to: https://github.com/TimDettmers/bitsandbytes/issues
================================================================================
CUDA SETUP: Loading binary C:\Users\Administrator\Downloads\oobabooga-windows\oobabooga-windows\installer_files\env\lib\site-packages\bitsandbytes\libbitsandbytes_cpu.dll...
C:\Users\Administrator\Downloads\oobabooga-windows\oobabooga-windows\installer_files\env\lib\site-packages\bitsandbytes\cextension.py:31: UserWarning: The installed version of bitsandbytes was compiled without GPU support. 8-bit optimizers and GPU quantization are unavailable.
  warn("The installed version of bitsandbytes was compiled without GPU support. "
The following models are available:

1. facebook_opt-30b
2. facebook_opt-6.7b

Which one do you want to load? 1-2

1

Loading facebook_opt-30b...
Warning: torch.cuda.is_available() returned False.
This means that no GPU has been detected.
Falling back to CPU mode.

Loading checkpoint shards: 100%|█████████████████████████████████████████████████████████| 7/7 [03:01<00:00, 25.90s/it]
Loaded the model in 184.88 seconds.
Loading the extension "gallery"... Ok.
C:\Users\Administrator\Downloads\oobabooga-windows\oobabooga-windows\installer_files\env\lib\site-packages\gradio\deprecation.py:40: UserWarning: The 'type' parameter has been deprecated. Use the Number component instead.
  warnings.warn(value)
Running on local URL:  http://0.0.0.0:8008

To create a public link, set `share=True` in `launch()`.
Output generated in 72.40 seconds (0.29 tokens/s, 21 tokens, context 82)
Output generated in 86.07 seconds (0.22 tokens/s, 19 tokens, context 347)
Output generated in 134.18 seconds (0.27 tokens/s, 36 tokens, context 380)

System Info

System: Dell PowerEdge R730
GPU: none (I have no idea how to insert a GPU into a server because i cant find any power supply for 
the pins if someone knows how it works please speak with me I would really appreciate it)
CPU: CPU Intel(R) Xeon(R) CPU E5-2687W v3 @ 3.10GHz (Virtualization and Boost is activated through the iDRAC)
RAM: 128GB ECC DDR4 with 8 16GB modules
Storage: RAID-1 in a DELL Perc H630 mini controller with two 300GB Dell SAS 16K HDDs
OS: Windows server 2022 Datacenter
github-actions[bot] commented 1 year ago

This issue has been closed due to inactivity for 30 days. If you believe it is still relevant, please leave a comment below.