microsoft / cognitive-services-speech-sdk-js

Microsoft Azure Cognitive Services Speech SDK for JavaScript
Other
263 stars 98 forks source link

Missing chunkSize in speech audio file WAV format #513

Closed mudssrali closed 2 years ago

mudssrali commented 2 years ago

We're using text-to-speech Microsoft Cognitive Services to generate speech audios. However we need it in WAV format according to following output format specifications:

Codec: PCMS16LE (araw)
Channel: Mono 
Sample Rate: 8000
Bits per Sample :16

Once we generate audio speech, we pass it to an IVR (Interactive Voice Response) service. When upload azure service generated speech audio file, we're getting error because of audio file format. We further dug into it, no clue until we inspect speech audio file metadata especially Raw Header. More information can be found here: https://github.com/Azure-Samples/cognitive-services-speech-sdk/issues/1450

After detailed analysis on different files Raw Headers, azure generated speech audio files with WAV format (Riff8Khz16BitMonoPcm) don't include chunkSize. We test it by converting speech audio through online tool provided by 3Cx and the uploaded to IVR and it's working fine.

Raw Header - Azure Speech API - success.wav

52 49 46 46 00 00 00 00 57 41 56 45 66 6D 74 20 10 00 00 00 01 00 01 00 40 1F 00 00 80
3E 00 00 02 00 10 00 64 61 74 61 BE 1B 03 00 00 00 FF FF 00 00 01 00 FF FF 00 00 00 00
00 00 00 00 FF FF 00 00 FE FF FF FF 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 01 00 00 00 01 00 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00

success.wav

Raw Header - Converted converted_success.wav

52 49 46 46 E2 1B 03 00 57 41 56 45 66 6D 74 20 10 00 00 00 01 00 01 00 40 1F 00 00 80 3E
00 00 02 00 10 00 64 61 74 61 BE 1B 03 00 00 00 FF FF 00 00 01 00 FF FF 00 00 00 00 00 00
00 00 FF FF 00 00 FE FF FF FF 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 01 00 00 00 01 00 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00

converted_success.wav

Tool that we used for conversion changes only chunkSize bytes from 00 00 00 00 to E2 1B 03 00

More Information about WAVE PCM soundfile format image WAVE PCM soundfile format

mudssrali commented 2 years ago

@glharper can you please suggest a release date for this fix? Thanks

glharper commented 2 years ago

@mudssrali mid-April, 2-3 weeks.

dargilco commented 2 years ago

Speech SDK 1.21 was release. See release notes: https://docs.microsoft.com/azure/cognitive-services/speech-service/releasenotes?tabs=speech-sdk#speech-sdk-1210-april-2022-release

mudssrali commented 2 years ago

Thanks @glharper for fixing this issue and releasing it. It would definitely save dollars of a non-profit organization since we have been paying to a third party service to convert wav into wav to fix headers.

Thank you @dargilco for the notes link. I was about to ask the release.