Closed liangzhenbin1992 closed 2 months ago
Based on your logs, it seems you processed the analysis only twice for your speech segment in total. There are two things that caught my attention in your logs:
[BLANK_AUDIO]
is recognized in the text segment, the processed audio data size is quite small. For example, in the first case, it's 79360 samples, which could be less than a second with a sample rate of 44100 and 2 channels on average. In the second case, it's 17600 samples, which is even smaller and could be about 0.2 seconds under the previously specified sample rate and number of channels. These durations are too brief to assume there's any speech data present, particularly in the second case. I suggest confirming that your voice is actually being captured in these recognitions by playing back the capturable sound wave (you can use the PlaySound2D
function, for example).Can't capture sound wave
, likely because of attempts to initiate capture for multiple capturable sound waves simultaneously, which isn't supported by the hardware or UE per-platform microphone handling implementation. Please ensure that you don't capture audio data simultaneously, and start a new capture only after the previous one has been stopped (when StopCapture
is called).Also, the Blueprints implementation you created appears to handle recognition based on clicks, and you derived these nodes from the basic streaming example. But this example isn't intended for a continuous stop/start workflow, it's purely for demonstration purposes. I suggest exploring a demo project that handles microphone recognition with more attention to continuous recognition and interrupting workflows, as it generally offers better design for more complex solutions: Demo Project.
And by the way, if you prefer to suppress [BLANK_AUDIO]
messages, you can do so by setting bSuppressBlank
to true (SetSuppressBlank
function).
@liangzhenbin1992 I had the same problem just today (which is how I found this issue). For me the problem was actually already solved after updating both the RuntimeSpeechRecognizer
and the RuntimeAudioImporter
plugin.
Would be interesting to know what had caused it, though.
@liangzhenbin1992 I had the same problem just today (which is how I found this issue). For me the problem was actually already solved after updating both the
RuntimeSpeechRecognizer
and theRuntimeAudioImporter
plugin.Would be interesting to know what had caused it, though.
It could be due to issues in the underlying library whisper.cpp in earlier versions, which have been fixed since then :)
Please reopen if it's still relevant.
when I set BP like this and compile it, everything seems fine, but when I started, it shows blank Audio, and There's no issue with the microphone. so I am confuse, I don't know what happens. Please help me , thanks a lot. and this is the BP and the output log: output_LOG.txt
[part of output log]: …… LogRuntimeSpeechRecognizer: Pending audio data instead of enqueuing it since it is not enough to fill the step size (pending: 14879, num of samples per step: 80000) LogRuntimeAudioImporter: No need to resample or mix audio data LogRuntimeAudioImporter: Reallocating buffer to append data (new capacity: 566400) LogRuntimeSpeechRecognizer: Pending audio data instead of enqueuing it since it is not enough to fill the step size (pending: 15039, num of samples per step: 80000) LogBlueprintUserMessages: [BP_ThirdPersonCharacter_C_0] 完成声音捕获 LogRuntimeSpeechRecognizer: Enqueued audio data from the pending audio to the queue of the speech recognizer as the last data (num of samples: 15039) LogRuntimeSpeechRecognizer: Processed audio data with the size of 79360 samples to the whisper recognizer LogRuntimeSpeechRecognizer: Recognized text segment: " [BLANK_AUDIO]" LogBlueprintUserMessages: [BP_ThirdPersonCharacter_C_0] [BLANK_AUDIO] LogRuntimeSpeechRecognizer: Speech recognition progress: 100 LogRuntimeSpeechRecognizer: Speech recognition progress: 0 LogRuntimeSpeechRecognizer: Processed audio data with the size of 17600 samples to the whisper recognizer LogRuntimeSpeechRecognizer: Recognized text segment: " [BLANK_AUDIO]" LogBlueprintUserMessages: [BP_ThirdPersonCharacter_C_0] [BLANK_AUDIO] LogRuntimeSpeechRecognizer: Speech recognition progress: 100 LogBlueprintUserMessages: [BP_ThirdPersonCharacter_C_0] Can't capture sound wave LogRuntimeSpeechRecognizer: Speech recognition finished LogRuntimeSpeechRecognizer: Stopping the speech recognizer thread LogCore: Display: Tracing Screenshot "ScreenShot00007" taken with size: 2578 x 1408 LogCore: Display: Tracing Screenshot "ScreenShot00008" taken with size: 2578 x 1408 LogSlate: Updating window title bar state: overlay mode, drag disabled, window buttons hidden, title bar hidden LogWorld: BeginTearingDown for /Game/ThirdPerson/Maps/UEDPIE_0_ThirdPersonMap LogWorld: UWorld::CleanupWorld for ThirdPersonMap, bSessionEnded=true, bCleanupResources=true LogSlate: InvalidateAllWidgets triggered. All widgets were invalidated LogWorldPartition: UWorldPartition::Uninitialize : World = /Game/ThirdPerson/Maps/UEDPIE_0_ThirdPersonMap.ThirdPersonMap LogContentBundle: [ThirdPersonMap(Standalone)] Deleting container. LogWorldMetrics: [UWorldMetricsSubsystem::Deinitialize] LogWorldMetrics: [UWorldMetricsSubsystem::Clear] LogPlayLevel: Display: Shutting down PIE online subsystems LogSlate: InvalidateAllWidgets triggered. All widgets were invalidated LogRuntimeAudioImporter: Warning: Imported sound wave ('CapturableSoundWave_1') data will be cleared because it is being unloaded LogSlate: Updating window title bar state: overlay mode, drag disabled, window buttons hidden, title bar hidden LogAudioMixer: Deinitializing Audio Bus Subsystem for audio device with ID 3 LogAudioMixer: FMixerPlatformXAudio2::StopAudioStream() called. InstanceID=3 LogAudioMixer: FMixerPlatformXAudio2::StopAudioStream() called. InstanceID=3 LogUObjectHash: Compacting FUObjectHashTables data took 0.79ms LogPlayLevel: Display: Destroying online subsystem :Context_8