jambonz / jambonz-webapp

A simple provisioning web app for jambonz
MIT License
5 stars 21 forks source link

feat: provision record all call #254

Closed xquanluu closed 1 year ago

davehorton commented 1 year ago

OK, I have tested with a longer call flow. See call_sid d62db37c-d2d3-4f25-9814-cbd9e79e0968 on jambonz.org. The conversation was 118 seconds long, yet the wavesurfer UI only shows and plays the first 12 seconds. The downloaded audio also includes only the first 12 seconds.

image

davehorton commented 1 year ago

OK, I see the problem. It's not due to length, it is ending the background listen incorrectly when the first background gather completes, see below.

We need to support this scenario (common for voicebots) where we will start a background gather, go off and do some things, and then the background gather completes, but we must not kill the background record at that time. The background record must go until the end of the call -- although we probably need to provide the ability to end it earlier, as well as pause and resume it

[15:01:32.752] ^[[34mDEBUG^[[39m (3965722): ^[[36mWsRequestor:request websocket: sent (voice)^[[39m
    callId: "2ee1afde-728b-123c-628c-161343ac42f3"
    callSid: "d62db37c-d2d3-4f25-9814-cbd9e79e0968"
    accountSid: "9351f46a-678c-43f5-b8a6-d4eb58d131af"
    callingNumber: "+15083084809"
    calledNumber: "+15086908019"
    traceId: "0d0c5f5f72b75c887786d21087371704"
    obj: {
      "type": "verb:hook",
      "msgid": "igMZqKqmydgaXt6X3oAexn",
      "call_sid": "d62db37c-d2d3-4f25-9814-cbd9e79e0968",
      "hook": "voice",
      "data": {
        "call_sid": "d62db37c-d2d3-4f25-9814-cbd9e79e0968",
        "direction": "inbound",
        "from": "+15083084809",
        "to": "+15086908019",
        "call_id": "2ee1afde-728b-123c-628c-161343ac42f3",
        "sip_status": 200,
        "sip_reason": "OK",
        "call_status": "in-progress",
        "account_sid": "9351f46a-678c-43f5-b8a6-d4eb58d131af",
        "trace_id": "0d0c5f5f72b75c887786d21087371704",
        "application_sid": "66381298-c034-4d77-9a53-4bc2fdd8a420",
        "fs_sip_address": "10.0.214.82:5070",
        "originating_sip_ip": "54.172.60.2",
        "originating_sip_trunk_name": "Twilio",
        "api_base_url": "http://34.202.214.124/v1",
        "speech": {
          "language_code": "en-US",
          "channel_tag": 1,
          "is_final": false,
          "alternatives": [
            {
              "confidence": 0,
              "transcript": "yeah"
            }
          ],
          "vendor": {
            "name": "google",
            "evt": {
              "stability": 0.009999999776482582,
              "is_final": false,
              "alternatives": [
                {
                  "confidence": 0,
                  "transcript": "yeah"
                }
              ],
              "language_code": "en-us",
              "channel_tag": 0,
          "result_end_time": 3570
        }
      }
    }
[15:01:32.745] ^[[32mINFO^[[39m (3965722): ^[[36mCallSession:kill^[[39m
    callId: "2ee1afde-728b-123c-628c-161343ac42f3"
    callSid: "d62db37c-d2d3-4f25-9814-cbd9e79e0968"
    accountSid: "9351f46a-678c-43f5-b8a6-d4eb58d131af"
    callingNumber: "+15083084809"
    calledNumber: "+15086908019"
    traceId: "0d0c5f5f72b75c887786d21087371704"
[15:01:32.746] ^[[34mDEBUG^[[39m (3965722): ^[[36mlisten is being killed^[[39m
    callId: "2ee1afde-728b-123c-628c-161343ac42f3"
    callSid: "d62db37c-d2d3-4f25-9814-cbd9e79e0968"
    accountSid: "9351f46a-678c-43f5-b8a6-d4eb58d131af"
    callingNumber: "+15083084809"
    calledNumber: "+15086908019"
    traceId: "0d0c5f5f72b75c887786d21087371704"
[15:01:32.746] ^[[34mDEBUG^[[39m (3965722): ^[[36mTaskListen:kill endpoint connected? true^[[39m
    callId: "2ee1afde-728b-123c-628c-161343ac42f3"
    callSid: "d62db37c-d2d3-4f25-9814-cbd9e79e0968"
    accountSid: "9351f46a-678c-43f5-b8a6-d4eb58d131af"
    callingNumber: "+15083084809"
    calledNumber: "+15086908019"
davehorton commented 1 year ago

related to this, once we have very lengthy recordings we need to make sure that they look ok in the UI. Minimally the wavesurfer UI panel needs to be scrollable so can you scroll to the right for longer audio clips, possibly we should provide some sizing controls (eg size to fit, enlarge, or reduce)

davehorton commented 1 year ago

The downloaded wave files are quite large. See below, the size difference between the wave file with pcm audio and the same audio in an mp3 file. This was for a 2 minute call and so for longer calls I am worried the files will just be too large to easily download, play and store. Can you look into a way that we can download them as mp3 files somehow?

image

davehorton commented 1 year ago

can you make "Overlay STT results" default to unchecked

xquanluu commented 1 year ago

default overlay stt result is unchecked now

davehorton commented 1 year ago

It should not be possible to check "Record all calls" if call recording has not been enabled image

In fact, "Record all calls for this account" should be part of the information that only shows when "Enable call recording" is checked. It should not be in its own section as it is above

davehorton commented 1 year ago

When I am in the middle of a gather verb and I hangup without saying anything, the STT results overlay has wrong information: image

If I look at the application trace, this was the span detail for that gather: image

If stt.resolve = killed then the details should not show vendor or language and Transcript should be:

Transcript: None (call disconnected or speech session terminated)

davehorton commented 1 year ago

When I don't say anything and let the gather timeout I am seeing the same result: image and the span detail was image

Somehow we need to distinguish these two cases and show them differently:

Now in the feature server log I can see that it did return a timeout result

TaskGather:resolve with reason timeout
    callId: "b9b4db9a-7b2f-123c-d0b5-021e3c815c57"
    callSid: "58777053-6b76-4eb9-a27c-098902706b1f"
    accountSid: "9351f46a-678c-43f5-b8a6-d4eb58d131af"
    callingNumber: "+15083084809"
    calledNumber: "+15083728299"
    traceId: "aa8e6873c4dac9e0b7aed7b3cfff5379"

Actually, I can see the problem. We have two gathers here at the end of the call, but in the overlay they show as one

But on the STT overlay there is a single region that starts with the beginning of the first gather and continues until the end of the second gather. So we are not properly terminating the gather that ended with stt.resolve=timeout

To see this, you can log into jambonz.me with your credentials and check the call that I made at 2023 06.01 10:59 am.

I have also installed an echo test app locally on that server so if you want to make calls to it you can. You'll see I have a twilio DID pointed to it

xquanluu commented 1 year ago

It should not be possible to check "Record all calls" if call recording has not been enabled image

In fact, "Record all calls for this account" should be part of the information that only shows when "Enable call recording" is checked. It should not be in its own section as it is above

Fixed:

Screenshot 2023-06-02 at 09 31 03
xquanluu commented 1 year ago

None (call disconnected or speech session terminated)

Fixed

Screenshot 2023-06-02 at 09 43 33
xquanluu commented 1 year ago

e this, you can log into jambonz.me with your credentials and check the call that

xquanluu commented 1 year ago

When I don't say anything and let the gather timeout I am seeing the same result: image and the span detail was image

Somehow we need to distinguish these two cases and show them differently:

  • the app kills a gather before it had a chance to return a result
  • a gather returned a timeout result due to no user input

Now in the feature server log I can see that it did return a timeout result

TaskGather:resolve with reason timeout
    callId: "b9b4db9a-7b2f-123c-d0b5-021e3c815c57"
    callSid: "58777053-6b76-4eb9-a27c-098902706b1f"
    accountSid: "9351f46a-678c-43f5-b8a6-d4eb58d131af"
    callingNumber: "+15083084809"
    calledNumber: "+15083728299"
    traceId: "aa8e6873c4dac9e0b7aed7b3cfff5379"

Actually, I can see the problem. We have two gathers here at the end of the call, but in the overlay they show as one

  • the first one results in a timeout; I then prompt "are you still there" and start a new gather
  • the second gather ends with killed since the caller hangs up

But on the STT overlay there is a single region that starts with the beginning of the first gather and continues until the end of the second gather. So we are not properly terminating the gather that ended with stt.resolve=timeout

To see this, you can log into jambonz.me with your credentials and check the call that I made at 2023 06.01 10:59 am.

I have also installed an echo test app locally on that server so if you want to make calls to it you can. You'll see I have a twilio DID pointed to it

Fixed

Screenshot 2023-06-02 at 10 13 47 Screenshot 2023-06-02 at 10 13 39

davehorton commented 1 year ago

It's looking good. The next thing is that we need to make the STT overlay regions work for 'transcribe', in addition to 'gather'.

As I looked into this, I realized that transcribe is a long running verb that will give a stream of transcriptions while it runs. And we were not creating spans for each of these, but only for the entire verb itself. So I have checked in and merged a change to feature server so that transcribe creates child spans for each transcript (or timeout event) that it handles while running. These are what you will need to pick up in order to create the regions on the audio playback UI.

So first you will need to rebase the feature server branch to main again to pick up the changes to get these child spans, and then you will need to make the changes to the webapp here.

I have applied the feature server changes to jambonz.me, and now in the traces view I can see the child spans, which look like this:

image

the child spans are named stt-listen:1 for transcriptions from channel 1 (the caller) and stt-listen:2 for transcriptions from channel 2 (the called party). Note that we only have channel 2 transcriptions if the transcribe is nested in a dial verb and the transcription is requesting to transcribe both channels.

The child span details have the transcription raw data just like the gather verb (as well as timeout events, etc) so you should be able to get it for the popup windows to display when the regions are clicked.

You can see in the screenshot that even though I have checked "Overlay STT results" there are no regions. This is what we want to fix now.

davehorton commented 1 year ago

I've updated feature-server on this branch to create child spans for the listen for DTMF events: image

The span detail shows the dtmf that was pressed as well as the duration of the keypress in milliseconds.

So, now I would like to create a new overlay region type for DTMF events. The color should be different than the STT regions, and the label "Overlay STT results" should change to "Overlay STT and DTMF events"

I have updated both jambonz.one and jambonz.me with these changes

davehorton commented 1 year ago

There is a bug somewhere supporting regions other than us-east-1.

The issue seems to be that even though the region is still us-east-2 in the database, the webapp is defaulting somehow to display us-east-1. When I returned to the screen and see the incorrect "us-east-1" region, if I immediately log out and then return the accounts screen, it then shows the correct us-east-2, so the incorrect display must be coming from the webapp not the database

davehorton commented 1 year ago

The STT overlay does not work well on a stereo recording where we have two channels image

In the image above you can see the overlays take up the entire vertical space (both sound tracks) even though the STT results will only pertain to one of the sound tracks.

Does the region feature let you specify which of the two tracks to apply the region to?

If we can't have different regions for each channel, then I think what we should do is assign a different color to the regions on channel 2. That way, hopefully if you have overlapping transcript regions for each channel you could at least see how to click on each (assuming they were not completely identical in terms of start and stop time)

davehorton commented 1 year ago

The input for region in bucket credentials should be a dropdown rather than an input and it should show the same list of regions that is shown for an aws speech credential (ie should be able to use that same data/values). It should be required but not default to a specific region.

xquanluu commented 1 year ago

There is a bug somewhere supporting regions other than us-east-1.

  • I put in S3 credentials with region us-east-2 and Save
  • I tried to retrieve a recent call
  • I came back to account settings and now region shows as us-east-1

The issue seems to be that even though the region is still us-east-2 in the database, the webapp is defaulting somehow to display us-east-1. When I returned to the screen and see the incorrect "us-east-1" region, if I immediately log out and then return the accounts screen, it then shows the correct us-east-2, so the incorrect display must be coming from the webapp not the database

fixed

xquanluu commented 1 year ago

The input for region in bucket credentials should be a dropdown rather than an input and it should show the same list of regions that is shown for an aws speech credential (ie should be able to use that same data/values). It should be required but not default to a specific region.

fixed

davehorton commented 1 year ago
xquanluu commented 1 year ago
  • he necessary changes in entrypoint.sh so it will also work in Docker (have a look to see how the other en

FIxed