uezo / aiavatarkit

🥰 Building AI-based conversational avatars lightning fast ⚡️💬
Apache License 2.0
162 stars 10 forks source link

Improve speech recognition accuracy #44

Closed uezo closed 1 month ago

uezo commented 1 month ago

Overview

Improved the accuracy of the speech recognition system by implementing automatic threshold settings based on measured ambient noise levels.

[INFO] 2024-05-12 11:15:54,704 : Input device: [1] MacBook Airのマイク
[INFO] 2024-05-12 11:15:54,704 : Output device: [2] MacBook Airのスピーカー
[INFO] 2024-05-12 11:15:54,816 : Measuring noise levels...
Noise level: -61.68dB
[INFO] 2024-05-12 11:15:57,964 : Set volume threshold: -41.0dB

Additionally, adjusted the volume measurement to a fixed interval of 0.05 seconds, ensuring all data is consistently analyzed for better precision.

Threshold adjustment

Introduce a new parameter noise_margin to allow dynamic adjustment of the sensitivity margin above the measured noise level. This parameter helps in fine-tuning the voice detection threshold based on ambient noise conditions, enhancing the flexibility and effectiveness of the audio settings.

app = AIAvatar(
    openai_api_key=OPENAI_API_KEY,
    google_api_key=GOOGLE_API_KEY,
    noise_margin=10.0
)

Set noise filter level manually

To manually set the noise filter level for voice detection, set auto_noise_filter_threshold to False and specify the volume_threshold_db in decibels (dB).

app = AIAvatar(
    openai_api_key=OPENAI_API_KEY,
    google_api_key=GOOGLE_API_KEY,
    auto_noise_filter_threshold=False,
    volume_threshold_db=-40   # Set the voice detection threshold to -40 dB
)