wechaty / getting-started

A Starter Project Template for Wechaty works out-of-the-box
https://gitpod.io/#https://github.com/wechaty/wechaty-getting-started
Apache License 2.0
774 stars 342 forks source link

speech-to-text-bot demo not work #228

Open DreamerLark opened 2 years ago

DreamerLark commented 2 years ago
wechaty  "~1.7.22",
wechaty-puppet-service(puppet_wxwork)
node v16.11.1

code

  Ffmpeg(mp3Stream)
    .fromFormat('mp3')
    .toFormat('wav')
    .pipe(wavStream as any)

log

15:16:27 ERR Config ###########################
15:16:27 ERR Config Wechaty uncaughtException: Error: ffmpeg exited with code 1: Error opening filters!
    at ChildProcess.<anonymous> (/home/lantu/wechaty-getting-started/node_modules/fluent-ffmpeg/lib/processor.js:182:22)
    at ChildProcess.emit (node:events:390:28)
    at ChildProcess.emit (node:domain:475:12)
    at Process.ChildProcess._handle.onexit (node:internal/child_process:290:12) uncaughtException
15:16:27 ERR Config ###########################
DreamerLark commented 2 years ago

ffmpeg -version ffmpeg version 2.8.15 Copyright (c) 2000-2018 the FFmpeg developers

huan commented 2 years ago

I think the FileBox returns from the Audio type message of WXWork, the format is not mp3.

It should be a .silk type (please confirm and correct me if I'm wrong), and what you need is https://www.npmjs.com/package/wx-voice for dealing it.

DreamerLark commented 2 years ago

yes, is silk. but...

curl -i -X POST -H "Content-Type: audio/wav;rate=8000" "http://vop.baidu.com/server_api?dev_pid=1537&cuid=wechaty&token=xxxxxxxxwerwrerewrqerwrqrerwr-25013827" --data-binary "@./output.wav" HTTP/1.1 100 Continue

HTTP/1.1 200 OK Connection: keep-alive Content-Type: application/json Date: Wed, 17 Nov 2021 11:48:20 GMT P3p: CP=" OTI DSP COR IVA OUR IND COM " Server: nginx/1.8.0 Set-Cookie: BAIDUID=CD92829E00E91D3BE6C2F9214B244A93:FG=1; expires=Thu, 17-Nov-22 11:48:20 GMT; max-age=31536000; path=/; domain=.baidu.com; version=1 Tracecode: 29006618250356096266111719 Content-Length: 164

{"corpus_no":"7031504421501887334","err_msg":"success.","err_no":0,"result":["日本生日好吧嗯嗯嗯嗯嗯嗯嗯嗯嗯嗯嗯。"],"sn":"48633452341637149700"}

actual :牛栏山十瓶,二锅头十瓶,可口可乐一箱

huan commented 2 years ago

Yes, the speech to text needs to try different parameters to get a better result.

For example, the rate=8000 sometimes is very tricky and I suggest that we can try different rate numbers, parameters, and cloud services, and compare them for the result.