Open tenpai-git opened 1 month ago
python index.py --device cuda
doesn't existHi Yuki, thanks for getting back to me. I tried a lot of things but I wasn't successful in making Yuna-AI use the GPU.
I saw you made an update to the dev branch so I went ahead and tried that branch. I tried it on it's own, and I don't receive the earlier diffuser
error, however when I talk to Yuna over the in-software chat I can clearly see it uses my CPU on htop
and not any vram on nvidia-smi
, so it's still using CPU.
I tried some things. First, I changed static/config.json
. I changed the name, "device" key to cuda under server, and I tried adding a "device" key and "cuda" keypair for device in the AI section of config.json. I also tried to copy static/config.json
to ./config.json
before running python index.py
. None of these worked.
{
"ai": {
"names": [
"Senpai",
"Yuna"
],
"hinmitsu": false,
"agi": false,
"device": "cuda",
"emotions": false,
"art": false,
"vision": false,
"max_new_tokens": 512,
"context_length": 2048,
"temperature": 0.7,
"repetition_penalty": 1.1,
"last_n_tokens": 128,
"seed": -1,
"top_k": 100,
"top_p": 0.92,
"stop": [
"Yuki:",
"\nYuki:",
"\nYuna:",
"Yuna:",
"Yuuki:",
"<|user|>",
"<|system|>",
"<|model|>",
"###",
"Yuki;"
],
"batch_size": 512,
"threads": 8,
"gpu_layers": -1,
"flash_attn": true,
"use_mlock": true
},
"server": {
"port": "",
"url": "",
"default_history_file": "history_template:general.json",
"images": "images/",
"yuna_model_dir": "lib/models/yuna/",
"yuna_default_model": "yuna-ai-v3-q5_k_m.gguf",
"agi_model_dir": "lib/models/agi/",
"art_default_model": "yuna_ai_anime.safetensors",
"device": "mps",
"yuna_text_mode": "native",
"yuna_audio_mode": "siri",
"yuna_audio_name": "1.wav",
"yuna_reference_audio": "audio.mp3",
"output_audio_format": "audio.aiff"
},
[Security Key Omitted]
}
I was wondering how that would happen, so I checked the configuration in the application. To my surprise it hadn't loaded config.json
and had all defaults. I'm not sure where it's loading the config from.
So I tried changing device to cuda
in the in-app config but it also didn't work. The cache of the in-software configuration seems to not reflect config.json. Perhaps there is simply a cache carrying over somewhere. I tried making a new user but this was still the case.
Where is config.json
or the config loading from for index.py
? Maybe something isn't reading the config properly and if it's set to cuda at startup it will work? Just a guess.
Hope we can figure this out.
Hey, the configuration is all in static/config.json
which the program expects. Also, there two “devices” (lol just noticed) so make both CUDA. And you can try reading the documentation for llama-cpp-python and check out the generate.py file, because I see that it works on both Apple Silicon (the best) and Linux systems with CUDA and AMD
Hi Yuki, sorry about the above log, that was from an earlier test. I had both set to cuda
in a separate test and was just trying the key add. Just to show the dialog here, I tried other config settings like setting the gpu layer to 20 and different things. Everything runs fine on koboldcpp, so I'm not sure why it's not selecting a CUDA backend for yuna. I am using Arch to run this.
I tried looking into generator.py
and I tried adding n_gpu_layers = -1
to the instantiation of llm = LlamaCpp
but this didn't work either.
I also tried reinstalling llama_cpp_python
with various flags as described here: https://github.com/abetlen/llama-cpp-python/issues/250 - but no success to select the GPU.
Here's the log for you:
### Dialog:
{user_msg}
chat_history_manager -> <lib.history.ChatHistoryManager object at 0x783bdec6a600>
useHistory -> True
yunaConfig -> {'ai': {'names': ['Senpai', 'Yuna'], 'hinmitsu': False, 'agi': False, 'device': 'cuda', 'emotions': False, 'art': False, 'vision': False, 'max_new_tokens': 512, 'context_length': 2048, 'temperature': 0.7, 'repetition_penalty': 1.1, 'last_n_tokens': 128, 'seed': -1, 'top_k': 100, 'top_p': 0.92, 'stop': ['Yuki:', '\nYuki:', '\nYuna:', 'Yuna:', 'Yuuki:', '<|user|>', '<|system|>', '<|model|>', '###', 'Yuki;'], 'batch_size': 512, 'threads': 8, 'gpu_layers': 20, 'flash_attn': True, 'use_mlock': True}, 'server': {'port': '', 'url': '', 'default_history_file': 'history_template:general.json', 'images': 'images/', 'yuna_model_dir': 'lib/models/yuna/', 'yuna_default_model': 'yuna-ai-v3-q5_k_m.gguf', 'agi_model_dir': 'lib/models/agi/', 'art_default_model': 'yuna_ai_anime.safetensors', 'device': 'cuda', 'yuna_text_mode': 'native', 'yuna_audio_mode': 'siri', 'yuna_audio_name': '1.wav', 'yuna_reference_audio': 'audio.mp3', 'output_audio_format': 'audio.aiff'}, 'security': [Redacted]}
stream -> False
current_user.get_id() -> admin2
into the model -> ### Character:
Name: Yuna
Age: 15
Traits: Shy, Lovely, Obsessive
Nationality: Japanese
Occupation: Student
Hobbies: Reading, Drawing, Coding
Body: Slim, Short, Long hair, Flat chest
Hopefully someone can figure it out. Related to issue #97, seems to be the same problem.
Any updates?
Code for testing purposes:
model = Llama(
model_path="lib/models/yuna/yuna-ai-v3-q5_k_m.gguf",
n_ctx=1400,
seed=-1,
n_batch=512,
n_gpu_layers=-1,
n_threads=8,
use_mlock=True,
flash_attn=True,
verbose=False,
)
response = model(
"Hello",
stream=True,
top_k=100,
top_p=0.92,
temperature=0.7,
repeat_penalty=1.1,
max_tokens=1024,
stop=[
"LLLL",
]
)
for chunk in response:
print(chunk['choices'][0]['text'], end='', flush=True)```
Any updates?
Sorry Yuki, I can't seem to determine the source of this issue. I tried with koboldcpp and I can run yuna-ai-v3-q5_k_m.gguf on CUDA without issues, so it's not hardware related atleast.
When launching Yuna, I get the following stderr output: Hardware accelerator e.g. GPU is available in the environment, but no
device
argument is passed to thePipeline
object. Model will be on CPU.This appears to be related to the
diffusers
Python library generating the error.Despite running
python index.py --device cuda
or setting `"device": "cuda", in config.json under server the error still appears. Does it need to be set somewhere else?