marella / ctransformers

Python bindings for the Transformer models implemented in C/C++ using GGML library.
MIT License
1.79k stars 135 forks source link

Segmentation fault on m1 mac #8

Closed s-kostyaev closed 1 year ago

s-kostyaev commented 1 year ago

Trying simple example on m1 mac:

from ctransformers import AutoModelForCausalLM

llm = AutoModelForCausalLM.from_pretrained(
    "/path/to/starcoderbase-GGML/starcoderbase-ggml-q4_0.bin",
    model_type="starcoder",
    lib="basic",
)

print(llm("Hi"))

leads to segmentation fault. Model works fine with ggml example code.

s-kostyaev commented 1 year ago

I see code is updated, so this is output of commands.

marella commented 1 year ago

Thanks @s-kostyaev, I was actually asking bgonzalezfractal to run it so that I can check and comapre the output on their system as well :)

Since you already built it, can you also run ./build/lib/main on a starcoder model, because yesterday it was giving an empty response.

s-kostyaev commented 1 year ago

Sure.

%  ./build/lib/main starcoder ../LocalAI/models/starchat-alpha-ggml-q4_0.bin

model type : 'starcoder'
model path : '../LocalAI/models/starchat-alpha-ggml-q4_0.bin'
prompt     : 'Hi'

load ... ✔
tokenize ... ✔
> [ 12575 ]
eval ... ✔
sample ... ✔
> 399
detokenize ... ✔
> ' A'
delete ... ✔
marella commented 1 year ago

Thanks. So the C++ code works fine natively and doesn't have any issue. I will have to debug why it is failing from Python.

marella commented 1 year ago

@s-kostyaev I found another issue https://github.com/LibRaw/LibRaw/issues/437#issue-1065648301 which looks similar to the error you posted previously https://github.com/marella/ctransformers/issues/8#issuecomment-1557635980 They mention it to be a stack size limit issue which gets worse with multiple threads. So can you please try using threads=1 after building from source (I added some print statements):

git clone --recurse-submodules https://github.com/marella/ctransformers
cd ctransformers
git checkout debug
./scripts/build.sh
llm = AutoModelForCausalLM.from_pretrained(..., lib='/path/to/ctransformers/build/lib/libctransformers.dylib')

print(llm('Hi', max_new_tokens=1, threads=1))

Also please run with threads=4 and share both the outputs.

In above thread, they also suggested increasing stack size limit but I'm not sure what an ideal limit would be.

s-kostyaev commented 1 year ago

Sure. Will test it.

s-kostyaev commented 1 year ago

With single thread:

 %  python3 test.py 

ggml_graph_compute: n_threads = 0
ggml_graph_compute: create thread pool
ggml_graph_compute: initialize tasks + work buffer
ggml_graph_compute: allocating work buffer for graph (26048 bytes)
ggml_graph_compute: compute nodes

And it stucked.

marella commented 1 year ago

Are you using threads=1? because it is printing n_threads = 0! Can you also please check with threads=4.

s-kostyaev commented 1 year ago

Sure.

%  python3 test.py 

ggml_graph_compute: n_threads = 0
ggml_graph_compute: create thread pool
ggml_graph_compute: initialize tasks + work buffer
ggml_graph_compute: allocating work buffer for graph (26048 bytes)
ggml_graph_compute: compute nodes
s-kostyaev commented 1 year ago

This is with 4 threads set. And even set in 2 places - config and llm eval call.

marella commented 1 year ago

Thanks. I think I found the issue. I will make a new release and will let you know in sometime.

bgonzalezfractal commented 1 year ago

@marella sorry I've been working like crazy, I see @s-kostyaev executed the necessary commands, if you need anything else from my hardware just let me know, glad you guys found it.

marella commented 1 year ago

@marella sorry I've been working like crazy, I see @s-kostyaev executed the necessary commands, if you need anything else from my hardware just let me know, glad you guys found it.

No worries @bgonzalezfractal


@s-kostyaev I released a fix in the latest version 0.2.1 Please update:

pip install --upgrade ctransformers

and let me know if it works. Please don't set lib=... option.

Also please try running with different threads (1, 4, 8) and let me know if you see any change in performance.

s-kostyaev commented 1 year ago

Finally it works. Threads parameter works. It even works with conda now. Thank you!

marella commented 1 year ago

Thanks a lot @s-kostyaev for helping in debugging the issue.