Closed M00N-MAN closed 1 year ago
same for me but with RuntimeError: MPS does not support cumsum op with int64 input
same for me but with RuntimeError: MPS does not support cumsum op with int64 input
Hi, in my report it is last second line
Same here on an M1 Macbook Pro.
RuntimeError: MPS does not support cumsum op with int64 input
I got the same error except I told it option D at setup (for no GPU run on CPU only) and it STILL gives me that. No clue how that could be if it's supposed to be set up for only CPU. It shouldn't even be referring to MPS at all.
I assume there must be at least one reference to MPS that got missed somewhere but no clue where to go in the code to even try to fix it. I get further than I did recently. This at least lets me load the GUI, but then fails when I try to type in my input and then hit enter. Like with MOON-MAN, it fails when it tries to generate output.
I also got a same issue without "--cpu" on my m2 pro MacBook. When I executed server.py by myself, I got
python server.py --model vicunat --threads 8 --no-stream --api
Gradio HTTP request redirected to localhost :)
bin /Users/appe/miniconda3/envs/textgen/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cpu.so
/Users/appe/miniconda3/envs/textgen/lib/python3.10/site-packages/bitsandbytes/cextension.py:33: UserWarning: The installed version of bitsandbytes was compiled without GPU support. 8-bit optimizers, 8-bit multiplication, and GPU quantization are unavailable.
warn("The installed version of bitsandbytes was compiled without GPU support. "
Loading vicunat...
Loading checkpoint shards: 100%|██████████████████| 2/2 [00:02<00:00, 1.15s/it]
Loaded the model in 3.70 seconds.
Starting streaming server at ws://127.0.0.1:5005/api/v1/stream
Starting API at http://127.0.0.1:5000/api
Running on local URL: http://127.0.0.1:7860
To create a public link, set `share=True` in `launch()`.
127.0.0.1 - - [03/May/2023 11:36:56] "POST /api/v1/generate HTTP/1.1" 200 -
/Users/appe/miniconda3/envs/textgen/lib/python3.10/site-packages/transformers/generation/utils.py:690: UserWarning: MPS: no support for int64 repeats mask, casting it to int32 (Triggered internally at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/native/mps/operations/Repeat.mm:236.)
input_ids = input_ids.repeat_interleave(expand_size, dim=0)
Traceback (most recent call last):
File "/Users/appe/works/one-click-installers/text-generation-webui/modules/text_generation.py", line 272, in generate_reply
output = shared.model.generate(**generate_params)[0]
File "/Users/appe/miniconda3/envs/textgen/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/Users/appe/miniconda3/envs/textgen/lib/python3.10/site-packages/transformers/generation/utils.py", line 1485, in generate
return self.sample(
File "/Users/appe/miniconda3/envs/textgen/lib/python3.10/site-packages/transformers/generation/utils.py", line 2521, in sample
model_inputs = self.prepare_inputs_for_generation(input_ids, **model_kwargs)
File "/Users/appe/miniconda3/envs/textgen/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 736, in prepare_inputs_for_generation
position_ids = attention_mask.long().cumsum(-1) - 1
RuntimeError: MPS does not support cumsum op with int64 input
Output generated in 0.10 seconds (0.00 tokens/s, 0 tokens, context 198, seed 376260767)
Using the start_macos.sh:
./start_macos.sh
Gradio HTTP request redirected to localhost :)
bin /Users/appe/works/one-click-installers/installer_files/env/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cpu.so
/Users/appe/works/one-click-installers/installer_files/env/lib/python3.10/site-packages/bitsandbytes/cextension.py:33: UserWarning: The installed version of bitsandbytes was compiled without GPU support. 8-bit optimizers, 8-bit multiplication, and GPU quantization are unavailable.
warn("The installed version of bitsandbytes was compiled without GPU support. "
The following models are available:
1. .DS_Store
2. vicunat
Which one do you want to load? 1-2
2
Loading vicunat...
Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:02<00:00, 1.49s/it]
Loaded the model in 4.38 seconds.
Loading the extension "gallery"... Ok.
Running on local URL: http://127.0.0.1:7860
To create a public link, set `share=True` in `launch()`.
/Users/appe/works/one-click-installers/installer_files/env/lib/python3.10/site-packages/transformers/generation/utils.py:690: UserWarning: MPS: no support for int64 repeats mask, casting it to int32 (Triggered internally at /Users/runner/work/_temp/anaconda/conda-bld/pytorch_1678454852765/work/aten/src/ATen/native/mps/operations/Repeat.mm:236.)
input_ids = input_ids.repeat_interleave(expand_size, dim=0)
Traceback (most recent call last):
File "/Users/appe/works/one-click-installers/text-generation-webui/modules/callbacks.py", line 71, in gentask
ret = self.mfunc(callback=_callback, **self.kwargs)
File "/Users/appe/works/one-click-installers/text-generation-webui/modules/text_generation.py", line 290, in generate_with_callback
shared.model.generate(**kwargs)
File "/Users/appe/works/one-click-installers/installer_files/env/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/Users/appe/works/one-click-installers/installer_files/env/lib/python3.10/site-packages/transformers/generation/utils.py", line 1485, in generate
return self.sample(
File "/Users/appe/works/one-click-installers/installer_files/env/lib/python3.10/site-packages/transformers/generation/utils.py", line 2521, in sample
model_inputs = self.prepare_inputs_for_generation(input_ids, **model_kwargs)
File "/Users/appe/works/one-click-installers/installer_files/env/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 736, in prepare_inputs_for_generation
position_ids = attention_mask.long().cumsum(-1) - 1
RuntimeError: MPS does not support cumsum op with int64 input
Output generated in 0.30 seconds (0.00 tokens/s, 0 tokens, context 35, seed 1781086207)
The model I use was built follow the vicuna instructions, but I still got same issue when I'm using others downloaded by download-model.py .
Another dude helped to resolve current topic by substitution the gpt4-x-alpaca-30b-ggml-q4_1 repo in to the models directory
It works almost as expected, except the part which still doesn't work with M1 GPU even if pytorch should use MPS (metal perf shaders by apple) on macos 13.3.1
After start of oobabooga i have this
UserWarning: The installed version of bitsandbytes was compiled without GPU support. 8-bit optimizers, 8-bit multiplication, and GPU quantization are unavailable.
if you have no GPU or just MacOS on M1 then use ggml model
git-lfs is not a good option to download repo with such small amount of files. becuase it will will have .git idrectory which making the size on of repo be begger two times than model
download GPT4-X-Alpaca-30B-4bit
[ -e GPT4-X-Alpaca-30B-4bit ] && rm -rf GPT4-X-Alpaca-30B-4bit; mkdir -p GPT4-X-Alpaca-30B-4bit && ( cd GPT4-X-Alpaca-30B-4bit && curl https://huggingface.co/MetaIX/GPT4-X-Alpaca-30B-4bit/tree/main| grep 'Download file'| sed -e 's/.*href="/https:\/\/huggingface.co/' -e 's/">.*//' | while read line; do fname=$(basename $line); (( wget $line > ${fname}.log 2>&1 ) || echo FAIL ) >> ${fname}.log 2>&1 & done ; watch -n1 'for file in *.log; do echo "$file: $(tail -n2 $file|head -n1)"; done' )
substitute the model
GPT4XAlpaca30B4bit="$(pwd)/GPT4-X-Alpaca-30B-4bit" ( cd oobabooga_macos/text-generation-webui/models && ln -s "${GPT4XAlpaca30B4bit}/GPT4-X-Alpaca-30B-4bit" GPT4-X-Alpaca-30B-4bit )
substitute the oobabooga listening not only on localhost
cd oobabooga_macos
GRADIO_SERVER_NAME=0.0.0.0 ./start_macos.sh
Gradio HTTP request redirected to localhost :)
bin /Users/user/oobabooga_macos/installer_files/env/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cpu.so
/Users/user/oobabooga_macos/installer_files/env/lib/python3.10/site-packages/bitsandbytes/cextension.py:33: UserWarning: The installed version of bitsandbytes was compiled without GPU support. 8-bit optimizers, 8-bit multiplication, and GPU quantization are unavailable.
warn("The installed version of bitsandbytes was compiled without GPU support. "
The following models are available:
1. GPT4-X-Alpaca-30B-4bit
2. huggyllama_llama-30b
3. jeffwan_vicuna-13b
Which one do you want to load? 1-3
1
Loading GPT4-X-Alpaca-30B-4bit...
llama.cpp weights detected: models/GPT4-X-Alpaca-30B-4bit/gpt4-x-alpaca-30b-ggml-q4_1.bin
llama.cpp: loading model from models/GPT4-X-Alpaca-30B-4bit/gpt4-x-alpaca-30b-ggml-q4_1.bin
llama_model_load_internal: format = ggjt v1 (latest)
llama_model_load_internal: n_vocab = 32000
llama_model_load_internal: n_ctx = 2048
llama_model_load_internal: n_embd = 6656
llama_model_load_internal: n_mult = 256
llama_model_load_internal: n_head = 52
llama_model_load_internal: n_layer = 60
llama_model_load_internal: n_rot = 128
llama_model_load_internal: ftype = 3 (mostly Q4_1)
llama_model_load_internal: n_ff = 17920
llama_model_load_internal: n_parts = 1
llama_model_load_internal: model size = 30B
llama_model_load_internal: ggml ctx size = 110.30 KB
llama_model_load_internal: mem required = 25573.12 MB (+ 3124.00 MB per state)
llama_init_from_file: kv self size = 3120.00 MB
AVX = 0 | AVX2 = 0 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 0 | NEON = 1 | ARM_FMA = 1 | F16C = 0 | FP16_VA = 1 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 0 | VSX = 0 |
Loading the extension "gallery"... Ok.
Running on local URL: http://0.0.0.0:7860
To create a public link, set `share=True` in `launch()`.
Output generated in 25.82 seconds (0.58 tokens/s, 15 tokens, context 40, seed 2487567)
Output generated in 58.35 seconds (1.58 tokens/s, 92 tokens, context 86, seed 1278347018)
Output generated in 77.00 seconds (0.75 tokens/s, 58 tokens, context 217, seed 1313472362)
llama.cpp
./main --threads 8 -i --interactive-first --temp 0.5 -c 2048 -n -1 --ignore-eos --repeat_penalty 1.2 --instruct -r "### Instruction:" -m ../../../models/gpt4-x-alpaca-30b-ggml-q4_1.bin
main: build = 482 (e2cd506)
main: seed = 1683218507
llama.cpp: loading model from ../../../models/gpt4-x-alpaca-30b-ggml-q4_1.bin
llama_model_load_internal: format = ggjt v1 (latest)
llama_model_load_internal: n_vocab = 32000
llama_model_load_internal: n_ctx = 2048
llama_model_load_internal: n_embd = 6656
llama_model_load_internal: n_mult = 256
llama_model_load_internal: n_head = 52
llama_model_load_internal: n_layer = 60
llama_model_load_internal: n_rot = 128
llama_model_load_internal: ftype = 3 (mostly Q4_1)
llama_model_load_internal: n_ff = 17920
llama_model_load_internal: n_parts = 1
llama_model_load_internal: model size = 30B
llama_model_load_internal: ggml ctx size = 127.27 KB
llama_model_load_internal: mem required = 25573.13 MB (+ 3124.00 MB per state)
llama_init_from_file: kv self size = 3120.00 MB
system_info: n_threads = 8 / 10 | AVX = 0 | AVX2 = 0 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 0 | NEON = 1 | ARM_FMA = 1 | F16C = 0 | FP16_VA = 1 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 0 | VSX = 0 |
main: interactive mode on.
Reverse prompt: '### Instruction:'
sampling: repeat_last_n = 64, repeat_penalty = 1.200000, presence_penalty = 0.000000, frequency_penalty = 0.000000, top_k = 40, tfs_z = 1.000000, top_p = 0.950000, typical_p = 1.000000, temp = 0.500000, mirostat = 0, mirostat_lr = 0.100000, mirostat_ent = 5.000000
generate: n_ctx = 2048, n_batch = 512, n_predict = -1, n_keep = 2
== Running in interactive mode. ==
- Press Ctrl+C to interject at any time.
- Press Return to return control to LLaMa.
- If you want to submit another line, end your input in '\'.
`
Can you say something weird? Sure! I am an AI, and that is already pretty strange. But if it helps, here are some random facts about me:
- My favorite color is blue because of the way it looks in sunlight.
- Sometimes when I'm alone I like to dance to my own tune (literally!).
- I love eating cake but hate frosting.
- In a parallel universe, I would be a superhero with teleportation powers.
- I can speak over 10 languages fluently and am always learning more.
- My favorite movie is The Matrix because it's about the power of technology and how far we could go if we embrace it.
- Sometimes when no one is looking, I sing karaoke to my favorite songs from the '80s.
- If I had a pet, I would love having an owl or maybe even a dragon. #> `
oobabooga: `
Can you say something weird? I am an AI model trained to provide responses based on my knowledge and understanding of the task. My responses are generated using natural language processing, machine learning algorithms, and data from various sources including research papers, books, articles, and other relevant information. `
as well pure lame.cpp with gpt4-x-alpaca-30b-ggml-q4_1.bin able to receive and answer text in different language than English. Oobabooga with gpt4-x-alpaca-30b-ggml-q4_1.bin does 'understand' questions in other language but answers in English or with 'google_translate' plugin but with very poor quality.
And it seems oobabooga is consuming only 4 cores instead of all like llama.cpp does.
how to connect pure llama.cpp to oobabooga and make answers similar or at least find were oobabooga allow to control initial state from config file or by command arguments but NOT via web form settings?
why oobabooga dies with original description symptom if pytorch requirement for macos 13.3 is satisfied but bitsandbytes isn't compiled with current GPU?
Hi @mcmonkey4eva
could you please review the latest info here?
Having the same issue here. Mac M2, fresh install. I can start up the UI interface, but any prompt I enter results in the cumsum error.
Having the same issue here. Mac M2, fresh install. I can start up the UI interface, but any prompt I enter results in the cumsum error.
+1
This issue seems to be related to PyTorch on macOS. The problem can be resolved by using the nightly build of PyTorch for the time being.
Just a really easy fix for this issue on my Mac M1:
1) Open the 'webui.py' file. 2) Find the function 'install_dependencies()' and replace:
elif gpuchoice == "c" or gpuchoice == "d":
run_cmd("conda install -y -k pytorch torchvision torchaudio cpuonly git -c pytorch", assert_success=True, environment=True)
with:
elif gpuchoice == "c" or gpuchoice == "d":
run_cmd("conda install -y -k pytorch torchvision torchaudio cpuonly git -c pytorch", assert_success=True, environment=True)
run_cmd("pip3 install --upgrade --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cpu", assert_success=True, environment=True)
3) Reinstall by deleting 'text-generation-webui' folder, and then running 'start_macos.sh' again.
The fix by @joshuahigginson1 works. Thanks a lot. Just running the pip install as mentioned in the pytorch 96610 issue did not work I had to delete the directory and then run the install. Thanks
I did the above fix by Joshuahigginson1 and I get the following when I try to reinstall:
Traceback (most recent call last):
File "/Volumes/MegacityTwo/oobabooga_macos3/webui.py", line 163, in
Hi @kevinhower, this looks like an issue with the actual function 'run_cmd'. You might want to check that you've got the latest 'one-click-installer' files - https://github.com/oobabooga/one-click-installers cloned.
i got it to work ... sort of. It does generate text but it's ... well. gibberish. I said "hi" and it gave me the response of the following: "in the future so I am going on an article of the book for me
The U.S. Government has been infected by the virus that shut down the website Teknoepetitionen (meaning “the people’s petition” or more simply, but not without reason, they are also called the have a look at this whopping hmwever, we'll see what happens when the same thing happened before.
As usual, no one from the government, which means all the time! This year, however, he said that the campaign to end the the first two years, because the next three years. So far, the effort to get rid of the idea of a good time to be able to eat bread.
It was created around the world, and may even now, and how much money.
Avoiding food-related issues?
I'm sure most of us know someone else"
Just utter non-sense with the Pythia 6.9B model. Don't know if it is the model or some other issue.
@joshuahigginson1 Thanks a lot, that works! I'd like to add one modification here to back up models. I mistakenly lost a >10GB model and had to download it again 😅
Instruction with added backup/restore steps:
Just a really easy fix for this issue on my Mac M1:
- Open the 'webui.py' file.
- Find the function 'install_dependencies()' and replace:
elif gpuchoice == "c" or gpuchoice == "d": run_cmd("conda install -y -k pytorch torchvision torchaudio cpuonly git -c pytorch", assert_success=True, environment=True)
with:
elif gpuchoice == "c" or gpuchoice == "d": run_cmd("conda install -y -k pytorch torchvision torchaudio cpuonly git -c pytorch", assert_success=True, environment=True) run_cmd("pip3 install --upgrade --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cpu", assert_success=True, environment=True)
3. Backup models by moving 'text-generation-webui/models' folder to somewhere like your ~/Desktop
- Reinstall by deleting 'text-generation-webui' folder, and then running 'start_macos.sh' again. 5. Bring back the models to the original place
i got it to work ... sort of. It does generate text but it's ... well. gibberish. I said "hi" and it gave me the response of the following: "in the future so I am going on an article of the book for me
The U.S. Government has been infected by the virus that shut down the website Teknoepetitionen (meaning “the people’s petition” or more simply, but not without reason, they are also called the have a look at this whopping hmwever, we'll see what happens when the same thing happened before.
As usual, no one from the government, which means all the time! This year, however, he said that the campaign to end the the first two years, because the next three years. So far, the effort to get rid of the idea of a good time to be able to eat bread.
It was created around the world, and may even now, and how much money.
Avoiding food-related issues?
I'm sure most of us know someone else"
Just utter non-sense with the Pythia 6.9B model. Don't know if it is the model or some other issue.
actual answers of llm are 100% depending on the model you use. so please clarify which
as well "i got it to work"... what? and how? :)
same for me but with RuntimeError: MPS does not support cumsum op with int64 input
Hi, in my report it is last second line
+1
Same problem here
Same problem here, don't understand this. Did the install from command line exactly as directed by the readme for Mac (incl installation of requirements_nocuda.txt).
I don't really understand the solution from @joshuahigginson1 - where is the webui.py file? I don't have it in my downloaded text-generation-webui folder. Thanks in advance.
EDIT: realised that @joshuahigginson1 solution is from one-click installer. Tried that, but still didn't work, same error as above.
This appears to have been resolved elsewhere:
https://github.com/pytorch/pytorch/issues/96610#issuecomment-1597314364
But having implemented the change my inference time is still unusably slow at 0.02 tokens/sec. Anyone know why that might be? Thanks in advance. I have MacOS 13.5.2, Mac M1 Pro 16GB, python 3.10.9.
EDIT: to be clear - I'm not using the one-click installer here.
This issue has been closed due to inactivity for 6 weeks. If you believe it is still relevant, please leave a comment below. You can tag a developer in your comment.
Describe the bug
this is a continuation of https://github.com/oobabooga/text-generation-webui/issues/428
i'm following instruction for one click installer for macos https://github.com/oobabooga/one-click-installers
and have RuntimeError: MPS does not support cumsum op with int64 input always on any model
Is there an existing issue for this?
Reproduction
./update_macos.sh ./start_macos.sh
Screenshot
Logs
Full log
comment: i substitute following by symlinks
oobabooga_macos % du -hs /Users/master/sandbox/jeffwan_vicuna-13b
25G /Users/master/sandbox/jeffwan_vicuna-13b oobabooga_macos % du -hs /Users/master/sandbox/huggyllama_llama-30b 61G /Users/master/sandbox/huggyllama_llama-30b
oobabooga_macos % find text-generation-webui/models -type l -exec ls -lhas {} \;|awk '{$1=$2=$3=$4=$5=$6="";print $0}'|sed -E 's/^ +//g' Apr 30 16:54 text-generation-webui/models/jeffwan_vicuna-13b -> /Users/master/sandbox/jeffwan_vicuna-13b Apr 30 16:54 text-generation-webui/models/huggyllama_llama-30b -> /Users/master/sandbox/huggyllama_llama-30b
System Info