typoworx-de commented 1 year ago

LocalAI version: quay.io/go-skynet/local-ai:latest

Environment, CPU architecture, OS, and Version: IBM x3400 Server with

VMware Host (x86-64 CPU Arch)
VM Guest: Ubuntu 20.04 (x86-64 CPU Arch)
Docker version 24.0.2, build cb74dfc
docker-compose version 1.29.2

Describe the bug I'm new to localai and was trying to set-up the example "ChatGPT OSS Alternative" presented on localai-homepage. Link to example is: https://github.com/go-skynet/LocalAI/tree/master/examples/chatbot-ui

At first it looks like the localai-api is running fine, but sending any prompft using the chat-ui to the API causes crashing (see logs attached).

To Reproduce Try this example: https://github.com/go-skynet/LocalAI/tree/master/examples/chatbot-ui

This is my resulting docker-compose.yaml trying to adopt it:

version: '3.8'

services:
  api:
    # https://localai.io/basics/getting_started/index.html#run-localai-in-kubernetes
    #image: quay.io/go-skynet/local-ai:v1.18.0
    image: quay.io/go-skynet/local-ai:latest
    build:
      context: .
      dockerfile: Dockerfile
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:8080/readyz"]
      interval: 1m
      timeout: 20m
      retries: 20
    ports:
      - 8080:8080
    env_file:
      - .env
    environment:
      #- DEBUG=true
      - MODELS_PATH=/models
      # You can preload different models here as well.
      # See: https://github.com/go-skynet/model-gallery
      - 'PRELOAD_MODELS=[{"url": "github:go-skynet/model-gallery/gpt4all-j.yaml", "name": "gpt-3.5-turbo"}]'
    volumes:
      - "./models:/models:cached"
    command: ["/usr/bin/local-ai" ]

  chatgpt:
    depends_on:
      api:
        condition: service_healthy
    image: ghcr.io/mckaywrigley/chatbot-ui:main
    ports:
      - 3000:3000
    environment:
      - 'OPENAI_API_KEY=sk-XXXXXXXXXXXXXXXXXXXX'
      - 'OPENAI_API_HOST=http://api:8080'

Expected behavior I've expected a working example with at least any output to the chat-gpt like prompt. But there's only "internal error" response popping up.

Logs Log file from docker-container

Additional context

typoworx-de commented 1 year ago

local-ai_api_1_logs.txt

typoworx-de commented 1 year ago

Possibly related to this issues as well:

195, #192

typoworx-de commented 1 year ago

Just leaving it here in case others have similar problems ... obvisously my docker-machine had not enough RAM-Memory assigned, causing the crash when trying to load the models into RAM memory. Trying with more memory assigned to the VM and reporting here if it works then.

typoworx-de commented 1 year ago

Tried with 16 GB RAM attached still crashes the docker-container for localai-api without useful exception pointing out what's going wrong.

typoworx-de commented 1 year ago

I've cross checked now and deployed the same docker-compose setup on my notebook-workstation (Intel(R) Core(TM) i7-9750H CPU @ 2.60GHz") with Ubuntu OS/Docker. There it works!

The previous deployment that caused problems was on my IBM Server which runs VMware ESXi using a Intel(R) Xeon(R) CPU E5620 @ 2.40GHz and Ubuntu OS/Docker VM.

So either local-ai stack has any kind of problems with VMware virtualisation or with Intel Xeon CPU or Xeon Model E5620?!

kroshira commented 1 year ago

i have a Xeon E5649 CPU and have the same issue with the api crashing. I suspect it is an incompatible CPU.

Server specs
Dell R710
96 gig RAM
2x Xeon E5649 12 core @ 2.53GHz
28 TB storage
ubuntu 20.04 LTS 5.4.0-86-generic kernel

my docker compose file

version: '3.6'

services:
  api:
    image: quay.io/go-skynet/local-ai:latest
    # As initially LocalAI will download the models defined in PRELOAD_MODELS
    # you might need to tweak the healthcheck values here according to your network connection.
    # Here we give a timespan of 20m to download all the required files.
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:8080/readyz"]
      interval: 1m
      timeout: 20m
      retries: 20
    build:
      context: ./
      dockerfile: Dockerfile
    ports:
      - 8050:8080
    environment:
      - DEBUG=true
      - REBUILD=true
      - BUILD_TYPE=generic
      - MODELS_PATH=/models
      - THREADS=14
      - CMAKE_ARGS="-DLLAMA_F16C=OFF -DLLAMA_AVX512=OFF -DLLAMA_AVX2=OFF -DLLAMA_FMA=OFF"
      # You can preload different models here as well.
      # See: https://github.com/go-skynet/model-gallery
      - 'PRELOAD_MODELS=[{"url": "github:go-skynet/model-gallery/mpt-7b-chat.yaml", "name": "mpt-7b-chat"},{"url": "github:go-skynet/model-gallery/gpt4all-j.yaml", "name": "gpt-3.5-turbo"}, { "url": "github:go-skynet/model-gallery/bert-embeddings.yaml", "name": "text-embedding-ada-002"},{"url": "github:go-skynet/model-gallery/stablediffusion.yaml"}]'
    volumes:
      - ./models:/models:cached
    command: ["/usr/bin/local-ai" ]
  chatgpt:
    depends_on:
      api:
        condition: service_healthy
    image: ghcr.io/mckaywrigley/chatbot-ui:main
    ports:
      - 3500:3000
    environment:
      - 'OPENAI_API_KEY=sk-XXXXXXXXXXXXXXXXXXXX'
      - 'OPENAI_API_HOST=http://api:8080'
    volumes:
      - ./models:/models:cached

failure message (there is additional output that i can provide but i will truncate it here as this seems the most relevant):

5:53PM DBG Loading model llama from WizardLM-7B-uncensored.ggmlv3.q5_1
5:53PM DBG Loading model in memory from file: /models/WizardLM-7B-uncensored.ggmlv3.q5_1
SIGILL: illegal instruction
PC=0xa1ab80 m=9 sigcode=2
signal arrived during cgo execution
instruction bytes: 0xc5 0xf9 0x6f 0x5 0x98 0xbe 0x8c 0x0 0xc7 0x47 0x10 0x0 0x0 0x0 0x0 0x48

note: i have tried multiple models. best case scenario they return no response. worst case is it crashes like this. Would love to get this working on my server just for funsies. but im pretty sure the CPU is the limiting factor here. I know for a fact it does not have AVX so... thats a bad sign from the get go

bnusunny commented 1 year ago

This is most likely caused by AVX support. You can compile local-ai on this machine to get a version optimized for it.

cstuart1310 commented 1 year ago

Anyone else scouring through the issues for a solution, build it locally like bnusunny mentions above like this, but it does make it incredibly slow CMAKE_ARGS="-DLLAMA_F16C=OFF -DLLAMA_AVX512=OFF -DLLAMA_AVX2=OFF -DLLAMA_FMA=OFF" make build

https://github.com/go-skynet/LocalAI/issues/288#issuecomment-1580305158

bnusunny commented 1 year ago

Yeah, without AVX, ggml will be slow on CPU.

stereotypy commented 1 year ago

Anyone else scouring through the issues for a solution, build it locally like bnusunny mentions above like this, but it does make it incredibly slow CMAKE_ARGS="-DLLAMA_F16C=OFF -DLLAMA_AVX512=OFF -DLLAMA_AVX2=OFF -DLLAMA_FMA=OFF" make build

#288 (comment)

Does this work in the docker container? For me I got it to run locally with just CMAKE_ARGS="-DLLAMA_AVX2=OFF" but it was still crashing in the docker container.

localai-bot commented 1 year ago

:warning::warning::warning::warning::warning:

Hi! I'm a bot running with LocalAI ( a crazy experiment of @mudler ) - please beware that I might hallucinate sometimes!

_but.... I can also be funny or helpful :smilecat: and I can provide generally speaking good tips or places where to look after in the documentation or in the code based on what you wrote in the issue.

Don't engage in conversation with me, I don't support (yet) replying!

:warning::warning::warning::warning::warning:

I see that the issue is related to the LocalAI API image version. The latest version of LocalAI is 1.20.0, and the image used in the example is 1.18.0. This could be causing the issue. Please try updating the image tag in the api service in your docker-compose.yaml file to use the latest version of the LocalAI API image:

image: quay.io/go-skynet/local-ai:latest

Also, make sure that you have installed all the required dependencies and packages for LocalAI on your system. You can do this by following the installation instructions provided in the LocalAI documentation.

Sources:

mudler / LocalAI

Example Chat-UI (ChatGPT OSS Alternative) causing crash of API with preloaded model #574

195, #192

:warning::warning::warning::warning::warning:

:warning::warning::warning::warning::warning: