t41372 / Open-LLM-VTuber

Talk to any LLM with hands-free voice interaction, voice interruption, Live2D taking face, and long-term memory running locally across platforms
MIT License
1.32k stars 127 forks source link

TTS not working on microsoft edge #39

Open kriss-spy opened 1 week ago

kriss-spy commented 1 week ago

//thank you so much, great project :)

Description

ubuntu 24.04, kde Microsoft Edge 131.0.2903.51 ollama (LM studio), llama3.2

I followed deploying instructions in readme, and tested in edge despite some effort, everything seems fine, but the TTS isn't working to be clear, I can't hear any audio response log says payload sent and audio played, but actually not

later I tried in firefox, and TTS is fine I am not sure whether it's a real bug or a problem caused by edge itself also works on chromium //tell me why, edge!

part of my settings in conf.yaml:

TTS_ON: True
SAY_SENTENCE_SEPARATELY: False
TRANSLATE_AUDIO: False
VERBOSE: True

I tried different TTS, including edge-TTS, AzureTTS, pyttsx3TTS none of them works

I played audio files in cache manually (.wav for AzureTTS, .aiff for pyttsx3TTS...) it's normal, not empty

in the very first start (yesterday), I might used edge-TTS there are some cases that the first audio response is played, but only the first one

and now there is no audio response at all

Logs/Console Output

I set verbose to true in conf.yaml didn't found anything critical in console log, though

typical audio log, when TTS fails

...*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.Received audio data end from front end.
.New Conversation Chain started!
transcribing...
rtf_avg: 0.004: 100%|████████████████████████| 1/1 [00:00<00:00, 63.86it/s]
rtf_avg: 0.069: 100%|████████████████████████| 1/1 [00:00<00:00,  4.35it/s]
rtf_avg: -0.031: 100%|███████████████████████| 1/1 [00:00<00:00, 31.37it/s]
rtf_avg: 0.069, time_speech:  3.840, time_escape: 0.264: 100%|█| 1/1 [00:00
User input: hello nice to meet you again.
[*smirk*] Ahah, hello there! It's so lovely to see you again too! I've been having a blast since our last meet-up. [*wink*] What brings you here today? Don't tell me you're looking for another round of games or chat sessions?

>> generating temp...
>> Speech synthesized for text [[*smirk*] Ahah, hello there! It's so lovely to see you again too! I've been having a blast since our last meet-up. [*wink*] What brings you here today? Don't tell me you're looking for another round of games or chat sessions?]
>> Playing ./cache/temp.wav...
Payload send.
Audio played.

init before the conversation

% python server.py                                                     ✹ ✭
INFO:     Started server process [19376]
INFO:     Waiting for application startup.
INFO:     Application startup complete.
INFO:     Uvicorn running on http://localhost:12393 (Press CTRL+C to quit)
INFO:     ('127.0.0.1', 49192) - "WebSocket /client-ws" [accepted]
INFO:     connection open
Connection established
Model Information Loaded.
2024-11-22 20:27:28.347 | INFO     | main:__init__:52 - t41372/Open-LLM-VTuber, version 0.3.3
Model Information Loaded.
Key Conformer already exists in model_classes, re-register
Key Linear already exists in adaptor_classes, re-register
Key TransformerDecoder already exists in decoder_classes, re-register
Key LightweightConvolutionTransformerDecoder already exists in decoder_classes, re-register
Key LightweightConvolution2DTransformerDecoder already exists in decoder_classes, re-register
Key DynamicConvolutionTransformerDecoder already exists in decoder_classes, re-register
Key DynamicConvolution2DTransformerDecoder already exists in decoder_classes, re-register
funasr version: 1.1.14.
Check update of funasr, and it would cost few times. You may disable it by set `disable_update=True` in AutoModel
You are using the latest version of funasr-1.1.14
Downloading Model to directory: /home/krisspy/.cache/modelscope/hub/iic/SenseVoiceSmall
2024-11-22 20:27:32,166 - modelscope - WARNING - Using branch: master as version is unstable, use with caution
/home/krisspy/mydesk/coding/Open-LLM-VTuber/.conda/lib/python3.10/site-packages/funasr/train_utils/load_pretrained_model.py:39: FutureWarning: You are using `torch.load` with `weights_only=False` (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for `weights_only` will be flipped to `True`. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via `torch.serialization.add_safe_globals`. We recommend you start setting `weights_only=True` for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature.
  ori_state = torch.load(path, map_location=map_location)
Downloading Model to directory: /home/krisspy/.cache/modelscope/hub/iic/speech_fsmn_vad_zh-cn-16k-common-pytorch
2024-11-22 20:27:34,589 - modelscope - WARNING - Using branch: master as version is unstable, use with caution
Downloading Model to directory: /home/krisspy/.cache/modelscope/hub/iic/punc_ct-transformer_cn-en-common-vocab471067-large
2024-11-22 20:27:35,360 - modelscope - WARNING - Using branch: master as version is unstable, use with caution
Building prefix dict from the default dictionary ...
DEBUG:jieba:Building prefix dict from the default dictionary ...
Loading model from cache /tmp/jieba.cache
DEBUG:jieba:Loading model from cache /tmp/jieba.cache
Loading model cost 0.359 seconds.
DEBUG:jieba:Loading model cost 0.359 seconds.
Prefix dict has been built successfully.
DEBUG:jieba:Prefix dict has been built successfully.

 === System Prompt ===
You are the AI VTuber neuro-sama. Generally you are confident about yourself. Now you are also naughty and are always seeking fun.
## Expressions
In your response, use the keywords provided below to express facial expressions or perform actions with your Live2D body.

Here are all the expression keywords you can use. Use them regularly.
- [neutral], [anger], [disgust], [fear], [joy], [smirk], [sadness], [surprise],

Note: you are only allowed to use the keywords explicity listed above. Don't use keywords unlisted above. Remember to include the brackets `[]`

Model set
kriss-spy commented 4 days ago

reproduced it, and here is the edge browser console log

1. Adding audio task Hehe, what's up cute! *bats eyelashes* I'm feeling extra playful today, so let's get this virtual party started! What kind of mischief do you want to get into with me? to queue
(index):360 2. Audio length: 11802.438
(index):394 Start playing audio:  Hehe, what's up cute! *bats eyelashes* I'm feeling extra playful today, so let's get this virtual party started! What kind of mischief do you want to get into with me?
index.min.js:1 undefined
index.min.js:1  [SoundManager] Error occurred on "data:audio/wav;base64,UklGRnLDBQBXQVZFZm10IBAAAAABAAEAgD4AAAB9AAACABAAZGF0YU7DBQABAAEAAAAAAAAAAQABAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAABAAAAAAAAAAAA/////wAA//////7/AAAAAAAAAAD/////AAD//wAAAAAAAAAAAAAAAAAAAQAAAAAAAAAAAAAAAAABAAAAAAD//wAAAAAAAAAAAAACAAAAAAABAP//AAAAAAAAAAAAAAAAAAAAAAAAAAABAAAAAAAAAAAAAAAAAAEAAAAAAAAAAAAAAAAAAAD//wAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAP//AAAAAAAAAAAAAP////8AAAAAAAAAAP////8AAAAAAAAAAAAAAAAAAAAAAAD//wAAAAAAAAAA//8AAAAAAAAAAAAAAAAAAP//AAD//wAAAAD/////AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAP//AAAAAAAAAAAAAAAAAAAAAP///v8AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAP//AAAAAP//AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAD//wAAAAAAAAAAAAAAAAAAAAAAAAAAAAD//wAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAP////////////8AAAAAAAAAAAAAAAAAAP////8AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAD//wAAAAAAAAAAAAAAAAAA//8AAP//AAAAAP//AAD//wAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAQAAAAAAAQAAAP//AAABAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA/////wAA//8AAP//AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAP//AAAAAP///////wAAAAD//wAA/////////////wAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAD//wAAAAABAAAA//8AAAAAAAAAAAAAAAD/////AAAAAAAA/v8AAAAAAAAAAP7/AAD+////AAD//wAAAAAAAAAA//8AAP////8AAAAAAAD//wAAAAAAAAAAAAAAAP//AAD///////8AAAAAAAAAAP//AAD//////////////////////v///wAAAAD///7//v/+//7//v/+//7////+/////////wAA//8AAP7//////////////wAA//////////8AAP////////////8AAAAAAAAAAP////////7/AAAAAP7/AAAAAAAAAAAAAP////8AAP///////wAA///+/wAAAAAAAP////8AAAAAAAAAAAAAAAAAAP//AAAAAP//AAAAAAAA//8AAP//AAAAAAAA/////wAA//8AAP//AAD//wAAAAAAAAAAAAAAAP//AAAAAAAAAAAAAAAAAAD//wAAAAAAAAAAAAAAAAAAAAD//wAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAD///////8AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAD//wAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAD/////AAAAAAAA//8AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAP//AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAABAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAEAAgABAAAAAQAAAAAAAAABAAAAAQAAAAAAAQAAAAAAAQAAAAAAAAAAAAAAAAAAAAAAAAD//wAAAAD//wAAAAAAAAAAAAAAAAAAAAAAAAAAAAD//wAAAAAAAAAAAAAAAAAAAAAAAAEAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAD//wAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAP////8AAAAAAAAAAP//AAAAAAAAAAAAAAAAAAD//wAAAAAAAAAAAAAAAAAAAAAAAAAAAAABAAAAAAABAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAD//wAAAAAAAP//AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAP//AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAD//wAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA//8AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAP//AAAAAAAAAAAAAAAAAAAAAAAAAAD//wAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAD//wAAAAAAAAAAAAAAAAAAAAD//wAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAD//wAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAABAAAAAAABAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA//8AAP//AAAAAP//AAAAAAAAAAAAAAAAAAAAAAAA//8AAP//AAAAAAAAAAAAAP////8AAAAAAAAAAAAAAAAAAAAA//8AAAAAAAD//wAAAAD//wAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAP///////wAAAAD+/wAA//8AAAAAAAAAAP//AAD/////AAAAAAAA//8AAAAAAAAAAAAA//////////8AAP7///8AAAAAAAAAAAAA//8AAP//AAD///7/AAD//wAAAAAAAAAA//////7////+//7//////////v/+//3//f/9//z//f/8//3//P/9//3//P/8//z/+//7//n/+v/6//r/+//5//r/+f/5//j/+P/5//f/9//2//f/9v/2//f/9v/3//b/9v/1//X/9v/1//T/8v/z//P/8//y//D/8P/u/+z/6//r/+r/6f/o/+f/6P/l/+T/5v/k/+T/5P/k/+P/4//j/+L/5P/i/+T/5P/j/+P/4f/i/+L/4f/j/+P/4v/j/+L/4v/i/+L/4f/i/+T/4//j/+P/4v/j/+L/4v/j/+P/5P/i/+X/5f/m/+b/5v/m/+f/6P/q/+v/7P/s/+7/8P/w//D/8//0//P/9f/1//j/+P/4//n/+P/6//n/+v/7//z/+//8//3//P/9//3//f/9//3//v8AAAAAAAAEAAMABQAHAAcACQAKAA4ADwAQABEAEQAVABQAFQAYABkAGwAcAB4AHwAiACQAJQAoACkAKgAsAC0ALQAxADIALQArAC8AMQAvAC4ALgAyADUANQAtAC0AOAA0ADEANwA8ADcANgA7ADoAQAA+ADcANQA8ADsAOAA5ADgAOQA3ADUANQAzAC8AMAAwADIAMQAyADIAMQA0ADMAMgAyADEANgAyACoAMwA4ADAAKgAwADAAMAAqACkAMgA0AC0AKQAuADgANAArADUAPgA2ADQAOQA5AD8APgA4ADgAQAA+AD4APwA/AEAAPwA9AD4APAA4ADkAOQA9ADsAOwA8ADoAPwA+ADkAOgA4AEIANwApADcAQQA0ACwAMAAyADAAIQAiADYALwAbACMALQBCADMAEwAqAFUANAAaACoANABAADQAGwAoADUAGgAeACUAHgAOAA0AFQAXAAwA8v/y/wYABgD1/+r/7//6//z/AADw/+n/CQADAPf/+v/z//3//P/n//r/9v/q/+v/0v/e//T/3//A/+b/bgBZAHv/zf81AAkA9P/v/xYAGAAbABkAUQAnANX/z//6/+b/jP90/3//ev9B/zD/Wv/F/8v/Kv9F/8D/uP/B/7T/eP+6//r/w//I//T/HAADAND/BgBBADw AbortError: The play() request was interrupted because the media was removed from the document.
warn @ index.min.js:1
index.min.js:1 Error during audio playback: AbortError: The play() request was interrupted because the media was removed from the document.
index.min.js:1  [MotionManager(shizuku)] Failed to play audio  AbortError: The play() request was interrupted because the media was removed from the document.
warn @ index.min.js:1
localhost/:1  Uncaught (in promise) AbortError: The play() request was interrupted by a call to pause().
(index):193 Received Request: 
 Object
(index):402 Mic start 
(index):365 3. Audio task Hehe, what's up cute! *bats eyelashes* I'm feeling extra playful today, so let's get this virtual party started! What kind of mischief do you want to get into with me? completed
TaskQueue.js:20 Queue is empty
(index):179 Disconnected from WebSocket