wit-ai / wit

Natural Language Interface for apps and devices
https://wit.ai/
940 stars 91 forks source link

Can no longer correct 'live utterances #2459

Closed baroquedub closed 2 years ago

baroquedub commented 2 years ago

ask a question about wit?

I'm trying to figure out why my older Wit.ai apps have the option to listen to recordings and correct 'live utterances' in Understanding (Train Your App) but my new ones don't. Is there a Setting that's now off by default?

I find I can't even correct obvious incorrect transcriptions and I'm unsure what I should do with them. For example, If I've said "No I'm not", but the transcription is "Now I'm not", I don't want to mark this as Out of Scope because I might actually need the app to recognise this specific phrase as an intent (however I don't want to train it to incorrectly transcribe the voice data) What's the best approach here?

If applicable, what is the App ID where you are experiencing this issue? If you do not provide this, we cannot help. App ID, where Utterances cannot be corrected: 586523202720149 Old App ID where Utterances could be corrected: 309311617392381 (It's not been used for a while and currently has no live utterances)

yuzh174 commented 2 years ago

You should still get some live utterances, but I don't see much traffic in your new app. Also, how do you send requests to wit.ai? Could you send me the script (if you are using a script) or at least the headers you use in the requests?

baroquedub commented 2 years ago

It's not that there aren't any Live Utterances, more that they are not editable (can't be corrected), and they're text only. I no longer have an option to review the audio (as was previously possible in my other apps)

Using Unity 2020.3.30 with Voice SDK v 40.0 (latest). Scripts are the ones provided, e.g. Wit.cs, VoiceService.cs etc.

Do you have any advice on the second part of my issue, how to deal with these incorrect text transcriptions?

I find I can't even correct obvious incorrect transcriptions and I'm unsure what I should do with them. For example, If I've said "No I'm not", but the transcription is "Now I'm not", I don't want to mark this as Out of Scope because I might actually need the app to recognise this specific phrase as an intent (however I don't want to train it to incorrectly transcribe the voice data) What's the best approach here?

Should I 'train' according to what the text shows, or according to what I know the audio would have said?

yuzh174 commented 2 years ago

@baroquedub I see. For the first question, we don't display user voice data for privacy reasons. So you won't be able to see it anymore.

For the second question, you can correct the entity resolution by add it to the utterance, but not the transcription accuracy unfortunately. The transcription is a harder problem won't be solved by direction configuration. We are constantly working on improving transcription accuracy.

baroquedub commented 2 years ago

Thanks for the prompt reply.

So you say "you can correct the entity resolution by add it to the utterance" but should I? My question still stands... considering that the ASR (Automatic Speech Recognition) is inaccurately transcribing my voice, should I train the NLP with 'incorrect'/non-sensical utterances so that my app responds as I need it to?

Does that make sense?

As it's more important for the user experience to feel good, should I just train wit.ai that, for example, "Now mean now" (which is the transcription) should be dealt with as the same Intent as "No means no" (which is what was said). My fear is that if I mark it as 'Out of Scope' then my user will no longer be understood (because the transcription is consistently incorrect)

This feels really wrong to me but I don't see any other option, which is why I'm asking :)

yuzh174 commented 2 years ago

I see your concerns. Without accurate ASR, you would need to trick it by doing something like "now means no".

Another thing you could try is to add dynamic entities. See the usage here https://wit.ai/docs/http/20220608/#dynamic_entities_link

Basically what it does is to provide a list of words along with your requests. The ASR will bias towards these words if possible. In your example, you can put "no" in the dynamic entities list, but remember not to include "now". Dynamic entities are also used for dynamic entity synonyms, but what you need here is just for ASR.

baroquedub commented 2 years ago

Thanks. Very helpful. Interesting that the synonyms help bias the ASR towards a certain transcription.

I take it from what you've said (remember not to include "now") that I shouldn't train with incorrect transcriptions so I'll go with that for now.

I totally get why user voice data is no longer accessible for privacy reasons (I was initially a bit surprised that it was there!) but it does mean that we can't help train the ASR by making the corrections as we used to.