Closed michaelplevy closed 5 years ago
Reading the description and quickly looking through the changes, everything seems good to me. I'll review this in more details later....
As for the functions asr_session_stream_recognize() and asr_session_stream_write(), yes, these functions certainly were used at some point, but I have no idea whether or not they are currently used.
Here is some documentation that I hope is helpful. Help in the application is updated as:
help
usage:
Run demo ASR client
Run grammar_uri_list audio_input_file [profile_name | -] [params_file | -] [set | -] [get | -]
grammar_uri_list is the name of a grammar file (path is relative to data dir)
(default is grammar.xml)
or a comma separated list of grammar uris, where grammar_uri may be one of the following:
- http:// or https:// - grammars hosted on a web server and accessible by http/s
- builtin: - grammar is a VoiceXML builtin grammar like builtin:grammar/boolean
- supported prorpietary grammar URIs
- (no URI prefix) - grammar file is read from local data folder.
or a comma separated list of weighted grammars, each in text/grammar-ref-list format
audio_input_file is the name of audio file to process, (path is relative to data dir).
headerless PCM files are supported as well as WAV files with RIFF headers.
(default is one-8kHz.pcm)
profile_name is the configured client-profile, like one of 'uni2', 'uni1', ...
(default is uni2)
params_file is a path to a file of MRCP headers. A dash (-) may be used to skip this parameter.
Example headers are:
N-Best-List-Length: 3
No-Input-Timeout: 3000
Speech-Complete-Timeout: 500
(default is no parameter file. In this case the ASR defaults are used)
set - send parameter_file as headers in MRCP SET-PARAMS method, otherwise
parameters are sent as headers in the MRCP RECOGNIZE method
get - send MRCP GET-PARAMS method before and after recognition
example:
run grammar.xml one-8kHz.pcm uni2 params_default.txt set get
other examples:
run grammar.xml one.wav
run builtin:grammar/boolean yes.wav
run grammar.xml,builtin:grammar/boolean,http://example.com/operator.grxml speak_to_representative.wav
run operator.grxml speak_to_representative.wav uni2
run builtin:grammar/boolean yes.pcm uni2
run builtin:grammar/boolean yes.wav - params.txt set get
run http://example.com/grammars/grammar.grxml one.wav uni2
run <http://localhost/grammars/grammar.grxml>;weight="2.0",<builtin:grammar/boolean>;weight="0.75"
- loglevel [level] (set loglevel, one of 0,1...7)
- quit, exit
Here are some examples and the MRCP requests generated.
run grammar.xml one-8kHz.pcm uni2 params_default.txt
Define-grammar sent:
2019-08-16 09:26:40:564621 [INFO] Send MRCPv2 Data 10.11.104.231:53522 <-> 10.11.104.231:11544 [446 bytes]
MRCP/2.0 446 DEFINE-GRAMMAR 1
Channel-Identifier: 6abb48c4dd7c3749@speechrecog
Content-Type: application/srgs+xml
Content-Id: demo-grammar-0
Content-Length: 278
<?xml version="1.0"?>
<grammar xmlns="http://www.w3.org/2001/06/grammar" xml:lang="en-US" version="1.0" mode="voice" root="digit">
<rule id="digit">
<one-of>
<item>one</item>
<item>two</item>
<item>three</item>
</one-of>
</rule>
</grammar>
Recognize sent with parameters as recognize headers:
2019-08-16 09:26:40:599619 [INFO] Send MRCPv2 Data 10.11.104.231:53522 <-> 10.11.104.231:11544 [365 bytes]
MRCP/2.0 365 RECOGNIZE 2
Channel-Identifier: 6abb48c4dd7c3749@speechrecog
Content-Type: text/uri-list
Cancel-If-Queue: false
Start-Input-Timers: true
No-Input-Timeout: 5000
Recognition-Timeout: 20000
Speech-Complete-Timeout: 400
DTMF-Term-Timeout: 3000
DTMF-Interdigit-Timeout: 3000
Confidence-Threshold: 0.5
Content-Length: 23
session:demo-grammar-0
run grammar.xml one-8kHz.pcm uni2 params_default.txt set
Set-params sent:
2019-08-16 09:28:18:580608 [INFO] Send MRCPv2 Data 10.11.104.231:53535 <-> 10.11.104.231:11544 [244 bytes]
MRCP/2.0 244 SET-PARAMS 1
Channel-Identifier: a3d08a355605f14f@speechrecog
No-Input-Timeout: 5000
Recognition-Timeout: 20000
Speech-Complete-Timeout: 400
DTMF-Term-Timeout: 3000
DTMF-Interdigit-Timeout: 3000
Confidence-Threshold: 0.5
Define-grammar sent:
2019-08-16 09:28:18:602644 [INFO] Send MRCPv2 Data 10.11.104.231:53535 <-> 10.11.104.231:11544 [446 bytes]
MRCP/2.0 446 DEFINE-GRAMMAR 2
Channel-Identifier: a3d08a355605f14f@speechrecog
Content-Type: application/srgs+xml
Content-Id: demo-grammar-0
Content-Length: 278
<?xml version="1.0"?>
<grammar xmlns="http://www.w3.org/2001/06/grammar" xml:lang="en-US" version="1.0" mode="voice" root="digit">
<rule id="digit">
<one-of>
<item>one</item>
<item>two</item>
<item>three</item>
</one-of>
</rule>
</grammar>
Recognize sent (without extra parameters):
2019-08-16 09:28:18:636657 [INFO] Send MRCPv2 Data 10.11.104.231:53535 <-> 10.11.104.231:11544 [200 bytes]
MRCP/2.0 200 RECOGNIZE 3
Channel-Identifier: a3d08a355605f14f@speechrecog
Content-Type: text/uri-list
Cancel-If-Queue: false
Start-Input-Timers: true
Content-Length: 23
session:demo-grammar-0
run grammar.xml one-8kHz.pcm uni2 params_default.txt set get
Get-params sent with no headers to query current parameter set:
2019-08-16 09:29:40:740603 [INFO] Send MRCPv2 Data 10.11.104.231:53538 <-> 10.11.104.231:11544 [78 bytes]
MRCP/2.0 78 GET-PARAMS 1
Channel-Identifier: 8561df998879064c@speechrecog
Set-params sent:
2019-08-16 09:29:40:795607 [INFO] Send MRCPv2 Data 10.11.104.231:53538 <-> 10.11.104.231:11544 [244 bytes]
MRCP/2.0 244 SET-PARAMS 2
Channel-Identifier: 8561df998879064c@speechrecog
No-Input-Timeout: 5000
Recognition-Timeout: 20000
Speech-Complete-Timeout: 400
DTMF-Term-Timeout: 3000
DTMF-Interdigit-Timeout: 3000
Confidence-Threshold: 0.5
Define-grammar sent:
2019-08-16 09:29:40:817628 [INFO] Send MRCPv2 Data 10.11.104.231:53538 <-> 10.11.104.231:11544 [446 bytes]
MRCP/2.0 446 DEFINE-GRAMMAR 3
Channel-Identifier: 8561df998879064c@speechrecog
Content-Type: application/srgs+xml
Content-Id: demo-grammar-0
Content-Length: 278
<?xml version="1.0"?>
<grammar xmlns="http://www.w3.org/2001/06/grammar" xml:lang="en-US" version="1.0" mode="voice" root="digit">
<rule id="digit">
<one-of>
<item>one</item>
<item>two</item>
<item>three</item>
</one-of>
</rule>
</grammar>
Recognize sent:
2019-08-16 09:29:40:836735 [INFO] Send MRCPv2 Data 10.11.104.231:53538 <-> 10.11.104.231:11544 [200 bytes]
MRCP/2.0 200 RECOGNIZE 4
Channel-Identifier: 8561df998879064c@speechrecog
Content-Type: text/uri-list
Cancel-If-Queue: false
Start-Input-Timers: true
Content-Length: 23
session:demo-grammar-0
Get-params sent (at end of session):
2019-08-16 09:29:42:401603 [INFO] Send MRCPv2 Data 10.11.104.231:53538 <-> 10.11.104.231:11544 [78 bytes]
MRCP/2.0 78 GET-PARAMS 5
Channel-Identifier: 8561df998879064c@speechrecog
run http://10.90.48.61/SampleVXML/grammars/icecream.grxml chocolate.wav
Define-grammar sends http URI grammar with content-type text/uri-list
2019-08-16 09:31:49:641615 [INFO] Send MRCPv2 Data 10.11.104.231:53540 <-> 10.11.104.231:11544 [213 bytes]
MRCP/2.0 213 DEFINE-GRAMMAR 1
Channel-Identifier: 5066e04587fdad4f@speechrecog
Content-Type: text/uri-list
Content-Id: demo-grammar-0
Content-Length: 53
http://10.90.48.61/SampleVXML/grammars/icecream.grxml
run builtin:grammar/number spoken7_8_9.wav
Define-grammar sends builtin: URI grammar with content-type text/uri-list
2019-08-16 09:32:40:398608 [INFO] Send MRCPv2 Data 10.11.104.231:53543 <-> 10.11.104.231:11544 [182 bytes]
MRCP/2.0 182 DEFINE-GRAMMAR 1
Channel-Identifier: 1d385b06f141334a@speechrecog
Content-Type: text/uri-list
Content-Id: demo-grammar-0
Content-Length: 22
builtin:grammar/number
run http://10.90.48.61/SampleVXML/grammars/icecream.grxml,builtin:grammar/number chocolate.wav
Define-grammar is sent twice, once for each grammar in comma separated list. Recognize sends two grammars in list as text/uri-list
2019-08-16 09:34:07:792682 [INFO] Send MRCPv2 Data 10.11.104.231:53555 <-> 10.11.104.231:11544 [223 bytes]
MRCP/2.0 223 RECOGNIZE 3
Channel-Identifier: 8b5dd97d03020c42@speechrecog
Content-Type: text/uri-list
Cancel-If-Queue: false
Start-Input-Timers: true
Content-Length: 46
session:demo-grammar-0
session:demo-grammar-1
run <http://10.90.48.61/SampleVXML/grammars/icecream.grxml>
;weight="2.0",<builtin:grammar/number>
;weight="0.5" chocolate.wav
Define-grammar is sent twice, once for each grammar in weighted list. Recognize sends two grammars in list as text/grammar-ref-list with weights.
2019-08-16 09:35:59:046614 [INFO] Send MRCPv2 Data 10.11.104.231:53558 <-> 10.11.104.231:11544 [263 bytes]
MRCP/2.0 263 RECOGNIZE 3
Channel-Identifier: deaf2009be8d944e@speechrecog
Content-Type: text/grammar-ref-list
Cancel-If-Queue: false
Start-Input-Timers: true
Content-Length: 78
<session:demo-grammar-0>;weight="2.00"
<session:demo-grammar-1>;weight="0.50"
Thanks for the additional clarification. I can follow the code and the use cases are very clear to me.
The code does not compile as is on Linux using gcc. Some of the compilation errors relate to the use of BOOL, the others require -std=c99 to compile. There are other compilation warnings that turn into errors.
All these issues should be relatively easily addressable. However, the trouble is anything requiring more than a few minutes is quite challenging to find the time for. So, I'll revisit this task later...
Oh, sorry. We're a Windows shop and I have not had to compile for Linux. Perhaps someone can run with it and make it work for Linux. Let me know if I can help in other ways.
I also have an interest in this. Since I'm on Linux I will see if I can find the time to create a compliant PR from the patch above.
Oh thanks. I will try to review this and compile on Windows. (sorry, this week is packed, I may not get a chance until next week).
Well, I believe we are moving in the right direction, but may not be there yet.
Since the PR #247 slightly breaks VS build, I have created a new branch https://github.com/unispeech/unimrcp/commits/extended-asrclient to work on this issue until the code properly compiles and works as intended on all the supported platforms.
Please follow a series of commits from 89582765498dccf6020664539da0a792148fdcad to 11abdb733c0b35464ef92bf04655a114c8fa1119 and let me know if there are any concerns or suggestions. Everything seems to compile well on the Windows/Linux platforms I have tried.
Thanks
Thank you @achaloyan for your effort and sorry for not catching these problems in the first place. Your additional patches look perfectly fine to me, still compiles flawless with gcc 9.1.1.
Supporting -ansi
on Linux/gcc could probably avoid compatibility-problems with older MSVC versions but that would disallow things like C99-style comments...
I'm currently testing the asrclient
on the extended-asrclient
branch. I think it works as supposed but I noticed that the function set_param_from_file
segfaults when it is given a file with an empty line or the file ends with a newline.
This small refactoring fixes this:
asr_engine.c:956
if(str != NULL) {
val = apr_strtok(str,":",&last);
if (val != NULL) {
const char *pname = NULL;
const char *pvalue = NULL;
apr_collapse_spaces(val,val);
pname = val;
val = apr_strtok(NULL,":",&last);
if(val != NULL) {
apr_collapse_spaces(val,val);
pvalue = val;
set_individual_param(mrcp_message,recog_header,pname,pvalue);
}
}
}
Edit: added patch
Thanks for your input, Tobias. I have applied your path to the branch extended-asrclient.
Also, I have sent invitations to Tobias and Michael to join unimrcp-dev team in order to work on this feature directly on the dedicated branch.
I think we can freely collaborate when it comes to a change in the branch, even if something possibly breaks. However, please coordinate with me any changes you may want to apply to the master branch before committing.
Thanks
I just pushed one last change, to make the RIFF/WAVE detection/parsing a little less brittle.
Testing of the client went successfully from my point of view. Everything worked as expected. If there's no complaints on Windows (which I can't test), the whole changeset seems good to me.
Thanks. Looks good to me, compiles on Windows too.
I have made a final series of commits spanning from fcf3c551541e3db4bb5c2d7b2016f35e326e4cec to 18f7e388a3449b3ff313197e4b5a6ea8df0c7b2a. There are no functional changes.
Please review. If there are no objections, then the code will be merged back to the master sometime next week.
I have re-tested and don't have any complaints. I think the patch set is good to go.
The branch extended-asrclient has been merged into master.
I think this issue can be closed now. Thanks everyone for their efforts!
@michaelplevy Any objections?
Ok, let's close this issue for now. If anything comes up, we can certainly revisit this subject. Thanks for your contributions.
Thanks all. I'm sorry i wasn't able to contribute beyond the first suggestions. I've just been very busy with work and home life and haven't had extra time.
Hi Team,
I'm trying to build my own ASR plugin, I'm able to send the audio signals to my python Socket based component and able to save the signals into a file, but when I hear back that file along with human voice, there is lot of background noise(buzzz/zazzz sound), I feel there is some audio codec/formatting issue between unimrcp plugin (modified) and the python socket code. Can any one please guide me how is exact format I have follow on python while saving this signal into a file.
Issue
ASRClient has limitations that prevent it from exercising an ASR to cover typical IVR usage scenarios. For example, ASRClient in v1.6:
Sends a single define-grammar command so it is limited to one grammar. There is no support for multiple grammars (sent as content-type text/uri-list) and no support for weighted grammars (sent as content-type text/grammar-ref-list).
Reads grammar files from the file system and sends them to ASR as content-type application/srgs+xml or application/grammar+xml. There is no support for URI grammars such as:
Does not support common WAV files in RIFF format as audio source
Does not support MRCP SET-PARAMS or GET-PARAMS methods. There is no way to change MRCP parameters without changing code, so it can be difficult to test parameterized capabilities of an ASR.
Attached is a patch to address these limitations.
About the patch: All changes made to v1.6 base. Tested on Windows only.
Updated files include:
Some definitions had moved from asr_engine.h to asr_engine.c. I moved them back because I need them in a shared include.
Broke up original asr_session_file_recognize() into:
platforms\libasr-client\include\asr_engine_common.h – these types are broken out into their own .h because I use SWIG to wrap the libasr-client as an API. These structs are helpful and having them in their own .h made SWIG happy.
Data\params_default.txt – a sample parameter file that can be used to replace the default MRCP headers sent in v1.6 of asrclient.
Should work as the previous versions ASRClient
And I have one question about the source for libasr-client. Are these functions ever used? Should they be removed?
patch246_01.zip