unispeech / unimrcp

Open source cross-platform implementation of MRCP protocol
http://www.unimrcp.org
Apache License 2.0
376 stars 167 forks source link

ASRClient limitations for ASR testing #246

Closed michaelplevy closed 5 years ago

michaelplevy commented 5 years ago

Issue

ASRClient has limitations that prevent it from exercising an ASR to cover typical IVR usage scenarios. For example, ASRClient in v1.6:

Attached is a patch to address these limitations.

About the patch: All changes made to v1.6 base. Tested on Windows only.

Updated files include:

  1. platforms\asr-client\src\main.c
  2. platforms\libasr-client\include\asr_engine.h
  3. platforms\libasr-client\src\asr_engine.c
  4. platforms\libasr-client\include\asr_engine_common.h
  5. data\params_default.txt

Some definitions had moved from asr_engine.h to asr_engine.c. I moved them back because I need them in a shared include.

Broke up original asr_session_file_recognize() into:

platforms\libasr-client\include\asr_engine_common.h – these types are broken out into their own .h because I use SWIG to wrap the libasr-client as an API. These structs are helpful and having them in their own .h made SWIG happy.

Data\params_default.txt – a sample parameter file that can be used to replace the default MRCP headers sent in v1.6 of asrclient.

run grammar.xml one-8kHz.pcm uni2 params_default.txt

Should work as the previous versions ASRClient

And I have one question about the source for libasr-client. Are these functions ever used? Should they be removed?

patch246_01.zip

achaloyan commented 5 years ago

Reading the description and quickly looking through the changes, everything seems good to me. I'll review this in more details later....

As for the functions asr_session_stream_recognize() and asr_session_stream_write(), yes, these functions certainly were used at some point, but I have no idea whether or not they are currently used.

michaelplevy commented 5 years ago

Here is some documentation that I hope is helpful. Help in the application is updated as:

help
usage:
Run demo ASR client
Run grammar_uri_list audio_input_file [profile_name | -] [params_file | -] [set | -] [get | -]

    grammar_uri_list is the name of a grammar file (path is relative to data dir)
      (default is grammar.xml)
      or a comma separated list of grammar uris, where grammar_uri may be one of the following:
       - http:// or https:// - grammars hosted on a web server and accessible by http/s
       - builtin: - grammar is a VoiceXML builtin grammar like builtin:grammar/boolean
       - supported prorpietary grammar URIs
       - (no URI prefix) - grammar file is read from local data folder.
      or a comma separated list of weighted grammars, each in text/grammar-ref-list format

    audio_input_file is the name of audio file to process, (path is relative to data dir).
      headerless PCM files are supported as well as WAV files with RIFF headers.
      (default is one-8kHz.pcm)

    profile_name is the configured client-profile, like one of 'uni2', 'uni1', ...
      (default is uni2)

    params_file is a path to a file of MRCP headers. A dash (-) may be used to skip this parameter.
        Example headers are:
          N-Best-List-Length: 3
          No-Input-Timeout: 3000
          Speech-Complete-Timeout: 500
      (default is no parameter file. In this case the ASR defaults are used)

   set - send parameter_file as headers in MRCP SET-PARAMS method, otherwise
        parameters are sent as headers in the MRCP RECOGNIZE method

   get - send MRCP GET-PARAMS method before and after recognition

   example:
      run grammar.xml one-8kHz.pcm uni2 params_default.txt set get
   other examples:
      run grammar.xml one.wav
      run builtin:grammar/boolean yes.wav
      run grammar.xml,builtin:grammar/boolean,http://example.com/operator.grxml speak_to_representative.wav
      run operator.grxml speak_to_representative.wav uni2
      run builtin:grammar/boolean yes.pcm uni2
      run builtin:grammar/boolean yes.wav - params.txt set get
      run http://example.com/grammars/grammar.grxml one.wav uni2
      run <http://localhost/grammars/grammar.grxml>;weight="2.0",<builtin:grammar/boolean>;weight="0.75"

- loglevel [level] (set loglevel, one of 0,1...7)

- quit, exit

Here are some examples and the MRCP requests generated.

run grammar.xml one-8kHz.pcm uni2 params_default.txt

Define-grammar sent:

2019-08-16 09:26:40:564621 [INFO]   Send MRCPv2 Data 10.11.104.231:53522 <-> 10.11.104.231:11544 [446 bytes]
MRCP/2.0 446 DEFINE-GRAMMAR 1
Channel-Identifier: 6abb48c4dd7c3749@speechrecog
Content-Type: application/srgs+xml
Content-Id: demo-grammar-0
Content-Length: 278

<?xml version="1.0"?>
<grammar xmlns="http://www.w3.org/2001/06/grammar" xml:lang="en-US" version="1.0" mode="voice" root="digit">
  <rule id="digit">
    <one-of>
      <item>one</item>
      <item>two</item>
      <item>three</item>
    </one-of>
  </rule>
</grammar>

Recognize sent with parameters as recognize headers:

2019-08-16 09:26:40:599619 [INFO]   Send MRCPv2 Data 10.11.104.231:53522 <-> 10.11.104.231:11544 [365 bytes]
MRCP/2.0 365 RECOGNIZE 2
Channel-Identifier: 6abb48c4dd7c3749@speechrecog
Content-Type: text/uri-list
Cancel-If-Queue: false
Start-Input-Timers: true
No-Input-Timeout: 5000
Recognition-Timeout: 20000
Speech-Complete-Timeout: 400
DTMF-Term-Timeout: 3000
DTMF-Interdigit-Timeout: 3000
Confidence-Threshold: 0.5
Content-Length: 23

session:demo-grammar-0

run grammar.xml one-8kHz.pcm uni2 params_default.txt set

Set-params sent:

2019-08-16 09:28:18:580608 [INFO]   Send MRCPv2 Data 10.11.104.231:53535 <-> 10.11.104.231:11544 [244 bytes]
MRCP/2.0 244 SET-PARAMS 1
Channel-Identifier: a3d08a355605f14f@speechrecog
No-Input-Timeout: 5000
Recognition-Timeout: 20000
Speech-Complete-Timeout: 400
DTMF-Term-Timeout: 3000
DTMF-Interdigit-Timeout: 3000
Confidence-Threshold: 0.5

Define-grammar sent:

2019-08-16 09:28:18:602644 [INFO]   Send MRCPv2 Data 10.11.104.231:53535 <-> 10.11.104.231:11544 [446 bytes]
MRCP/2.0 446 DEFINE-GRAMMAR 2
Channel-Identifier: a3d08a355605f14f@speechrecog
Content-Type: application/srgs+xml
Content-Id: demo-grammar-0
Content-Length: 278

<?xml version="1.0"?>
<grammar xmlns="http://www.w3.org/2001/06/grammar" xml:lang="en-US" version="1.0" mode="voice" root="digit">
  <rule id="digit">
    <one-of>
      <item>one</item>
      <item>two</item>
      <item>three</item>
    </one-of>
  </rule>
</grammar>

Recognize sent (without extra parameters):

2019-08-16 09:28:18:636657 [INFO]   Send MRCPv2 Data 10.11.104.231:53535 <-> 10.11.104.231:11544 [200 bytes]
MRCP/2.0 200 RECOGNIZE 3
Channel-Identifier: a3d08a355605f14f@speechrecog
Content-Type: text/uri-list
Cancel-If-Queue: false
Start-Input-Timers: true
Content-Length: 23

session:demo-grammar-0

run grammar.xml one-8kHz.pcm uni2 params_default.txt set get

Get-params sent with no headers to query current parameter set:

2019-08-16 09:29:40:740603 [INFO]   Send MRCPv2 Data 10.11.104.231:53538 <-> 10.11.104.231:11544 [78 bytes]
MRCP/2.0 78 GET-PARAMS 1
Channel-Identifier: 8561df998879064c@speechrecog

Set-params sent:

2019-08-16 09:29:40:795607 [INFO]   Send MRCPv2 Data 10.11.104.231:53538 <-> 10.11.104.231:11544 [244 bytes]
MRCP/2.0 244 SET-PARAMS 2
Channel-Identifier: 8561df998879064c@speechrecog
No-Input-Timeout: 5000
Recognition-Timeout: 20000
Speech-Complete-Timeout: 400
DTMF-Term-Timeout: 3000
DTMF-Interdigit-Timeout: 3000
Confidence-Threshold: 0.5

Define-grammar sent:

2019-08-16 09:29:40:817628 [INFO]   Send MRCPv2 Data 10.11.104.231:53538 <-> 10.11.104.231:11544 [446 bytes]
MRCP/2.0 446 DEFINE-GRAMMAR 3
Channel-Identifier: 8561df998879064c@speechrecog
Content-Type: application/srgs+xml
Content-Id: demo-grammar-0
Content-Length: 278

<?xml version="1.0"?>
<grammar xmlns="http://www.w3.org/2001/06/grammar" xml:lang="en-US" version="1.0" mode="voice" root="digit">
  <rule id="digit">
    <one-of>
      <item>one</item>
      <item>two</item>
      <item>three</item>
    </one-of>
  </rule>
</grammar>

Recognize sent:

2019-08-16 09:29:40:836735 [INFO]   Send MRCPv2 Data 10.11.104.231:53538 <-> 10.11.104.231:11544 [200 bytes]
MRCP/2.0 200 RECOGNIZE 4
Channel-Identifier: 8561df998879064c@speechrecog
Content-Type: text/uri-list
Cancel-If-Queue: false
Start-Input-Timers: true
Content-Length: 23

session:demo-grammar-0

Get-params sent (at end of session):

2019-08-16 09:29:42:401603 [INFO]   Send MRCPv2 Data 10.11.104.231:53538 <-> 10.11.104.231:11544 [78 bytes]
MRCP/2.0 78 GET-PARAMS 5
Channel-Identifier: 8561df998879064c@speechrecog

run http://10.90.48.61/SampleVXML/grammars/icecream.grxml chocolate.wav

Define-grammar sends http URI grammar with content-type text/uri-list

2019-08-16 09:31:49:641615 [INFO]   Send MRCPv2 Data 10.11.104.231:53540 <-> 10.11.104.231:11544 [213 bytes]
MRCP/2.0 213 DEFINE-GRAMMAR 1
Channel-Identifier: 5066e04587fdad4f@speechrecog
Content-Type: text/uri-list
Content-Id: demo-grammar-0
Content-Length: 53

http://10.90.48.61/SampleVXML/grammars/icecream.grxml

run builtin:grammar/number spoken7_8_9.wav

Define-grammar sends builtin: URI grammar with content-type text/uri-list

2019-08-16 09:32:40:398608 [INFO]   Send MRCPv2 Data 10.11.104.231:53543 <-> 10.11.104.231:11544 [182 bytes]
MRCP/2.0 182 DEFINE-GRAMMAR 1
Channel-Identifier: 1d385b06f141334a@speechrecog
Content-Type: text/uri-list
Content-Id: demo-grammar-0
Content-Length: 22

builtin:grammar/number

run http://10.90.48.61/SampleVXML/grammars/icecream.grxml,builtin:grammar/number chocolate.wav

Define-grammar is sent twice, once for each grammar in comma separated list. Recognize sends two grammars in list as text/uri-list

2019-08-16 09:34:07:792682 [INFO]   Send MRCPv2 Data 10.11.104.231:53555 <-> 10.11.104.231:11544 [223 bytes]
MRCP/2.0 223 RECOGNIZE 3
Channel-Identifier: 8b5dd97d03020c42@speechrecog
Content-Type: text/uri-list
Cancel-If-Queue: false
Start-Input-Timers: true
Content-Length: 46

session:demo-grammar-0
session:demo-grammar-1

run <http://10.90.48.61/SampleVXML/grammars/icecream.grxml>;weight="2.0",<builtin:grammar/number>;weight="0.5" chocolate.wav

Define-grammar is sent twice, once for each grammar in weighted list. Recognize sends two grammars in list as text/grammar-ref-list with weights.

2019-08-16 09:35:59:046614 [INFO]   Send MRCPv2 Data 10.11.104.231:53558 <-> 10.11.104.231:11544 [263 bytes]
MRCP/2.0 263 RECOGNIZE 3
Channel-Identifier: deaf2009be8d944e@speechrecog
Content-Type: text/grammar-ref-list
Cancel-If-Queue: false
Start-Input-Timers: true
Content-Length: 78

<session:demo-grammar-0>;weight="2.00"
<session:demo-grammar-1>;weight="0.50"
achaloyan commented 5 years ago

Thanks for the additional clarification. I can follow the code and the use cases are very clear to me.

The code does not compile as is on Linux using gcc. Some of the compilation errors relate to the use of BOOL, the others require -std=c99 to compile. There are other compilation warnings that turn into errors.

All these issues should be relatively easily addressable. However, the trouble is anything requiring more than a few minutes is quite challenging to find the time for. So, I'll revisit this task later...

michaelplevy commented 5 years ago

Oh, sorry. We're a Windows shop and I have not had to compile for Linux. Perhaps someone can run with it and make it work for Linux. Let me know if I can help in other ways.

schlagert commented 5 years ago

I also have an interest in this. Since I'm on Linux I will see if I can find the time to create a compliant PR from the patch above.

michaelplevy commented 5 years ago

Oh thanks. I will try to review this and compile on Windows. (sorry, this week is packed, I may not get a chance until next week).

achaloyan commented 5 years ago

Well, I believe we are moving in the right direction, but may not be there yet.

Since the PR #247 slightly breaks VS build, I have created a new branch https://github.com/unispeech/unimrcp/commits/extended-asrclient to work on this issue until the code properly compiles and works as intended on all the supported platforms.

Please follow a series of commits from 89582765498dccf6020664539da0a792148fdcad to 11abdb733c0b35464ef92bf04655a114c8fa1119 and let me know if there are any concerns or suggestions. Everything seems to compile well on the Windows/Linux platforms I have tried.

Thanks

schlagert commented 5 years ago

Thank you @achaloyan for your effort and sorry for not catching these problems in the first place. Your additional patches look perfectly fine to me, still compiles flawless with gcc 9.1.1.

Supporting -ansi on Linux/gcc could probably avoid compatibility-problems with older MSVC versions but that would disallow things like C99-style comments...

schlagert commented 5 years ago

I'm currently testing the asrclient on the extended-asrclient branch. I think it works as supposed but I noticed that the function set_param_from_file segfaults when it is given a file with an empty line or the file ends with a newline.

This small refactoring fixes this:

asr_engine.c:956
            if(str != NULL) {
                val = apr_strtok(str,":",&last);
                if (val != NULL) {
                    const char *pname = NULL;
                    const char *pvalue = NULL;

                    apr_collapse_spaces(val,val);
                    pname = val;

                    val = apr_strtok(NULL,":",&last);
                    if(val != NULL) {
                        apr_collapse_spaces(val,val);
                        pvalue = val;
                        set_individual_param(mrcp_message,recog_header,pname,pvalue);
                    }
                }
            }

Edit: added patch

achaloyan commented 5 years ago

Thanks for your input, Tobias. I have applied your path to the branch extended-asrclient.

Also, I have sent invitations to Tobias and Michael to join unimrcp-dev team in order to work on this feature directly on the dedicated branch.

I think we can freely collaborate when it comes to a change in the branch, even if something possibly breaks. However, please coordinate with me any changes you may want to apply to the master branch before committing.

Thanks

schlagert commented 5 years ago

I just pushed one last change, to make the RIFF/WAVE detection/parsing a little less brittle.

Testing of the client went successfully from my point of view. Everything worked as expected. If there's no complaints on Windows (which I can't test), the whole changeset seems good to me.

achaloyan commented 5 years ago

Thanks. Looks good to me, compiles on Windows too.

achaloyan commented 5 years ago

I have made a final series of commits spanning from fcf3c551541e3db4bb5c2d7b2016f35e326e4cec to 18f7e388a3449b3ff313197e4b5a6ea8df0c7b2a. There are no functional changes.

Please review. If there are no objections, then the code will be merged back to the master sometime next week.

schlagert commented 5 years ago

I have re-tested and don't have any complaints. I think the patch set is good to go.

achaloyan commented 5 years ago

The branch extended-asrclient has been merged into master.

schlagert commented 5 years ago

I think this issue can be closed now. Thanks everyone for their efforts!

@michaelplevy Any objections?

achaloyan commented 5 years ago

Ok, let's close this issue for now. If anything comes up, we can certainly revisit this subject. Thanks for your contributions.

michaelplevy commented 5 years ago

Thanks all. I'm sorry i wasn't able to contribute beyond the first suggestions. I've just been very busy with work and home life and haven't had extra time.

ai-analysys commented 5 months ago

Hi Team,

I'm trying to build my own ASR plugin, I'm able to send the audio signals to my python Socket based component and able to save the signals into a file, but when I hear back that file along with human voice, there is lot of background noise(buzzz/zazzz sound), I feel there is some audio codec/formatting issue between unimrcp plugin (modified) and the python socket code. Can any one please guide me how is exact format I have follow on python while saving this signal into a file.