yukiarimo / yuna-ai

Your Private Companion. The future AGI takeover starts here!
https://www.yuna-ai.com/
GNU Affero General Public License v3.0
107 stars 14 forks source link

GPU doesn't work when configured #98

Open tenpai-git opened 1 month ago

tenpai-git commented 1 month ago

When launching Yuna, I get the following stderr output: Hardware accelerator e.g. GPU is available in the environment, but no device argument is passed to the Pipeline object. Model will be on CPU.

This appears to be related to the diffusers Python library generating the error.

Despite running python index.py --device cuda or setting `"device": "cuda", in config.json under server the error still appears. Does it need to be set somewhere else?

yukiarimo commented 1 month ago
  1. python index.py --device cuda doesn't exist
  2. Check the config
  3. Update to the new version (will be pushed right now)
tenpai-git commented 1 month ago

Hi Yuki, thanks for getting back to me. I tried a lot of things but I wasn't successful in making Yuna-AI use the GPU.

I saw you made an update to the dev branch so I went ahead and tried that branch. I tried it on it's own, and I don't receive the earlier diffuser error, however when I talk to Yuna over the in-software chat I can clearly see it uses my CPU on htop and not any vram on nvidia-smi, so it's still using CPU.

I tried some things. First, I changed static/config.json. I changed the name, "device" key to cuda under server, and I tried adding a "device" key and "cuda" keypair for device in the AI section of config.json. I also tried to copy static/config.json to ./config.json before running python index.py. None of these worked.

{
    "ai": {
        "names": [
            "Senpai",
            "Yuna"
        ],
        "hinmitsu": false,
        "agi": false,
    "device": "cuda",
        "emotions": false,
        "art": false,
        "vision": false,
        "max_new_tokens": 512,
        "context_length": 2048,
        "temperature": 0.7,
        "repetition_penalty": 1.1,
        "last_n_tokens": 128,
        "seed": -1,
        "top_k": 100,
        "top_p": 0.92,
        "stop": [
            "Yuki:",
            "\nYuki:",
            "\nYuna:",
            "Yuna:",
            "Yuuki:",
            "<|user|>",
            "<|system|>",
            "<|model|>",
            "###",
            "Yuki;"
        ],
        "batch_size": 512,
        "threads": 8,
        "gpu_layers": -1,
        "flash_attn": true,
        "use_mlock": true
    },
    "server": {
        "port": "",
        "url": "",
        "default_history_file": "history_template:general.json",
        "images": "images/",
        "yuna_model_dir": "lib/models/yuna/",
        "yuna_default_model": "yuna-ai-v3-q5_k_m.gguf",
        "agi_model_dir": "lib/models/agi/",
        "art_default_model": "yuna_ai_anime.safetensors",
        "device": "mps",
        "yuna_text_mode": "native",
        "yuna_audio_mode": "siri",
        "yuna_audio_name": "1.wav",
        "yuna_reference_audio": "audio.mp3",
        "output_audio_format": "audio.aiff"
    },
    [Security Key Omitted]
}

I was wondering how that would happen, so I checked the configuration in the application. To my surprise it hadn't loaded config.json and had all defaults. I'm not sure where it's loading the config from.

So I tried changing device to cuda in the in-app config but it also didn't work. The cache of the in-software configuration seems to not reflect config.json. Perhaps there is simply a cache carrying over somewhere. I tried making a new user but this was still the case.

Where is config.json or the config loading from for index.py? Maybe something isn't reading the config properly and if it's set to cuda at startup it will work? Just a guess.

Hope we can figure this out.

yukiarimo commented 1 month ago

Hey, the configuration is all in static/config.json which the program expects. Also, there two “devices” (lol just noticed) so make both CUDA. And you can try reading the documentation for llama-cpp-python and check out the generate.py file, because I see that it works on both Apple Silicon (the best) and Linux systems with CUDA and AMD

tenpai-git commented 1 month ago

Hi Yuki, sorry about the above log, that was from an earlier test. I had both set to cuda in a separate test and was just trying the key add. Just to show the dialog here, I tried other config settings like setting the gpu layer to 20 and different things. Everything runs fine on koboldcpp, so I'm not sure why it's not selecting a CUDA backend for yuna. I am using Arch to run this.

I tried looking into generator.py and I tried adding n_gpu_layers = -1 to the instantiation of llm = LlamaCpp but this didn't work either.

I also tried reinstalling llama_cpp_python with various flags as described here: https://github.com/abetlen/llama-cpp-python/issues/250 - but no success to select the GPU.

Here's the log for you:

### Dialog:
{user_msg}
chat_history_manager ->  <lib.history.ChatHistoryManager object at 0x783bdec6a600>
useHistory ->  True
yunaConfig ->  {'ai': {'names': ['Senpai', 'Yuna'], 'hinmitsu': False, 'agi': False, 'device': 'cuda', 'emotions': False, 'art': False, 'vision': False, 'max_new_tokens': 512, 'context_length': 2048, 'temperature': 0.7, 'repetition_penalty': 1.1, 'last_n_tokens': 128, 'seed': -1, 'top_k': 100, 'top_p': 0.92, 'stop': ['Yuki:', '\nYuki:', '\nYuna:', 'Yuna:', 'Yuuki:', '<|user|>', '<|system|>', '<|model|>', '###', 'Yuki;'], 'batch_size': 512, 'threads': 8, 'gpu_layers': 20, 'flash_attn': True, 'use_mlock': True}, 'server': {'port': '', 'url': '', 'default_history_file': 'history_template:general.json', 'images': 'images/', 'yuna_model_dir': 'lib/models/yuna/', 'yuna_default_model': 'yuna-ai-v3-q5_k_m.gguf', 'agi_model_dir': 'lib/models/agi/', 'art_default_model': 'yuna_ai_anime.safetensors', 'device': 'cuda', 'yuna_text_mode': 'native', 'yuna_audio_mode': 'siri', 'yuna_audio_name': '1.wav', 'yuna_reference_audio': 'audio.mp3', 'output_audio_format': 'audio.aiff'}, 'security': [Redacted]}
stream ->  False
current_user.get_id() ->  admin2
into the model ->  ### Character:
Name: Yuna
Age: 15
Traits: Shy, Lovely, Obsessive
Nationality: Japanese
Occupation: Student
Hobbies: Reading, Drawing, Coding
Body: Slim, Short, Long hair, Flat chest

Hopefully someone can figure it out. Related to issue #97, seems to be the same problem.

yukiarimo commented 3 weeks ago

Any updates?

yukiarimo commented 3 weeks ago

Code for testing purposes:



model = Llama(
    model_path="lib/models/yuna/yuna-ai-v3-q5_k_m.gguf",
    n_ctx=1400,
    seed=-1,
    n_batch=512,
    n_gpu_layers=-1,
    n_threads=8,
    use_mlock=True,
    flash_attn=True,
    verbose=False,
)

response = model(
    "Hello",
    stream=True,
    top_k=100,
    top_p=0.92,
    temperature=0.7,
    repeat_penalty=1.1,
    max_tokens=1024,
    stop=[
        "LLLL",
    ]
)

for chunk in response:
    print(chunk['choices'][0]['text'], end='', flush=True)``` 
tenpai-git commented 3 weeks ago

Any updates?

Sorry Yuki, I can't seem to determine the source of this issue. I tried with koboldcpp and I can run yuna-ai-v3-q5_k_m.gguf on CUDA without issues, so it's not hardware related atleast.