vocodedev / vocode-core

🤖 Build voice-based LLM agents. Modular + open source.
https://vocode.dev
MIT License
2.77k stars 464 forks source link

[Bug]: Function calls do not work for passing parameters or returning results #510

Closed rjheeta closed 6 months ago

rjheeta commented 6 months ago

Brief Description

Parameters are not being passed to functions (Actions), nor are functions (Actions) returning results.

I've attached a custom (and very simple) action called GetCompanyDirectory that is just supposed to return static JSON.

There are embedded print lines that attempt to show the parameters passed, and the results before sending back. The parameters are not printed, which tells me they are not being passed into the function correctly? Similarly, while the result is printed here, it does not appear to be returned back to the caller correctly.

from typing import Optional, Type
from pydantic import BaseModel, Field
from vocode.streaming.action.base_action import BaseAction
from vocode.streaming.models.actions import (
    ActionConfig,
    ActionInput,
    ActionOutput,
    ActionType,
)

class GetCompanyDirectoryActionConfig(ActionConfig, type=ActionType.GET_COMPANY_DIRECTORY):
    pass

class GetCompanyDirectoryParameters(BaseModel):
    first_name: str = Field(..., description="First name of the user")
    last_name: Optional[str] = Field(..., description="Last name of the user")

class GetCompanyDirectoryResponse(BaseModel):
    company_directory : list[dict]

class GetCompanyDirectory(
    BaseAction[
        GetCompanyDirectoryActionConfig,
        GetCompanyDirectoryParameters,
        GetCompanyDirectoryResponse
    ]
):
    description: str = "Returns a list of users in the company directory (first name, last name) \
        and phone number)."
    parameters_type: Type[GetCompanyDirectoryParameters] = GetCompanyDirectoryParameters
    response_type: Type[GetCompanyDirectoryResponse] = GetCompanyDirectoryResponse

    def lookup_user(self):
        return [
            {'first_name': 'John', 'last_name': 'Doe', 'phone_number': '555-888-9999'},
            {'first_name': 'James', 'last_name': 'Smith', 'phone_number': '444-111-8888'},
            {'first_name': 'David', 'last_name': 'Thomas', 'phone_number': '333-222-1234'}
        ]

    async def run(
        self, action_input: ActionInput[GetCompanyDirectoryParameters]
    ) -> ActionOutput[GetCompanyDirectoryResponse]:
        """
        Returns a list of users in the company directory (first name, last name, and phone number)
        Pass parameters as a pipe-separated list like <first_name>|<last_name>
        """

        # Note: We're not actually doing anything with the params here. This is
        # just a proof of concept to show params are not being passed
        print('****** Lookup User ******')
        print('Parameters passed:', action_input.params)

        result = self.lookup_user()

        print('Result:', result)

        return ActionOutput(
            action_type=self.action_config.type,
            response=GetCompanyDirectoryResponse(
                company_directory=result
            ),
        )

LLM

GPT-4

Transcription Services

Deepgram

Synthesis Services

Eleven Labs

Telephony Services

Twilio

Conversation Type and Platform

Real-time streaming / Twilio

Steps to Reproduce

  1. Copy the code above and place into /streaming/action/get_company_directory.py
  2. Modify action/factory.py to include an elif clause for GetCompanyDirectoryActionConfig
    
    from vocode.streaming.action.base_action import BaseAction
    from vocode.streaming.action.nylas_send_email import (
    NylasSendEmail,
    NylasSendEmailActionConfig,
    )
    from vocode.streaming.action.get_company_directory import (
    GetCompanyDirectory,
    GetCompanyDirectoryActionConfig,
    )
    from vocode.streaming.models.actions import ActionConfig
    from vocode.streaming.action.transfer_call import TransferCall, TransferCallActionConfig

class ActionFactory: def create_action(self, action_config: ActionConfig) -> BaseAction: if isinstance(action_config, NylasSendEmailActionConfig): return NylasSendEmail(action_config, should_respond=True) elif isinstance(action_config, GetCompanyDirectoryActionConfig): return GetCompanyDirectory(action_config, should_respond=True) elif isinstance(action_config, TransferCallActionConfig): return TransferCall(action_config) else: raise Exception("Invalid action type")

3. Modify `models/actions.py` to include `GET_COMPANY_DIRECTORY`

Rest of code here...

class ActionType(str, Enum): BASE = "action_base" NYLAS_SEND_EMAIL = "action_nylas_send_email" TRANSFER_CALL = "action_transfer_call" GET_COMPANY_DIRECTORY = "action_get_company_directory"

Rest of code here...

4. In your main app file, include your action in your call

telephony_server = TelephonyServer( base_url=BASE_URL, config_manager=config_manager, inbound_call_configs=[ TwilioInboundCallConfig( url="/vocode", agent_config=ChatGPTAgentConfig( initial_message=BaseMessage(text=prompt), end_conversation_on_goodbye=False, send_filler_audio=FillerAudioConfig(silence_threshold_seconds=0.5), prompt_preamble=preamble, temperature=0, model_name="gpt-4", actions=[ GetCompanyDirectoryActionConfig(), ] ), synthesizer_config=ElevenLabsSynthesizerConfig.from_telephone_output_device( voice_id=voice_id, api_key="xxx" ), twilio_config=TwilioConfig( account_sid="xxx", auth_token="xxx", ) ) ], logger=logger, )


### Expected Behavior

It should correctly pass parameters into the function, and the function should return the JSON so that the main caller can parse that JSON and give the correct phone number.

### Screenshots

See output log below. Draw your attention to a few key items:

1. I provide the employee's first & last name
2. The "Parameters passed: " print statement does not print anything, telling me that the parameters were not passed to the function
3. While the Action body prints the JSON, I suspect it's not correctly being returned either (much like parameters are not passed into it) because the main LLM states it cannot find the user. 

Note that I am using the latest version (git clone of the main branch) 

INFO: Started server process [96703] INFO: Waiting for application startup. INFO: Application startup complete. INFO: Uvicorn running on http://127.0.0.1:3000 (Press CTRL+C to quit) INFO: 35.174.106.67:0 - "POST /vocode HTTP/1.1" 200 OK INFO: ('3.85.62.132', 0) - "WebSocket /connect_call/s9VwDrnr5ZHjxG1bP5g-5Q" [accepted] DEBUG:main:Phone WS connection opened for chat s9VwDrnr5ZHjxG1bP5g-5Q DEBUG:main:[s9VwDrnr5ZHjxG1bP5g-5Q] Trying to attach WS to outbound call DEBUG:main:[s9VwDrnr5ZHjxG1bP5g-5Q] Attached WS to outbound call INFO: connection open DEBUG:main:[s9VwDrnr5ZHjxG1bP5g-5Q] Media WS: Received event 'start': {"event":"start","sequenceNumber":"1","start":{"accountSid":"REDACTED","streamSid":"MZe87d0cc727d8a13f8a718e173865ad71","callSid":"CA764f2327e7f6dd93d43e76af8dfa8347","tracks":["inbound"],"mediaFormat":{"encoding":"audio/x-mulaw","sampleRate":8000,"channels":1},"customParameters":{}},"streamSid":"MZe87d0cc727d8a13f8a718e173865ad71"} DEBUG:main:[s9VwDrnr5ZHjxG1bP5g-5Q] Filling 1008 bytes of silence DEBUG:main:[s9VwDrnr5ZHjxG1bP5g-5Q] Synthesizing speech for message DEBUG:main:[s9VwDrnr5ZHjxG1bP5g-5Q] Sent chunk 0 with size 8000 DEBUG:main:[s9VwDrnr5ZHjxG1bP5g-5Q] Sent chunk 1 with size 4121 DEBUG:main:[s9VwDrnr5ZHjxG1bP5g-5Q] Message sent: Hi, how can I direct your call? INFO:main:[s9VwDrnr5ZHjxG1bP5g-5Q] Ignoring empty transcription INFO:main:[s9VwDrnr5ZHjxG1bP5g-5Q] Ignoring empty transcription DEBUG:main:[s9VwDrnr5ZHjxG1bP5g-5Q] Got transcription: Hi. Can I speak to James Smith, please?, confidence: 0.9992042 DEBUG:main:[s9VwDrnr5ZHjxG1bP5g-5Q] Human started speaking DEBUG:main:[s9VwDrnr5ZHjxG1bP5g-5Q] Responding to transcription DEBUG:main:[s9VwDrnr5ZHjxG1bP5g-5Q] Sending filler audio DEBUG:main:[s9VwDrnr5ZHjxG1bP5g-5Q] No filler audio available for synthesizer DEBUG:main:[s9VwDrnr5ZHjxG1bP5g-5Q] Synthesizing speech for message ** Lookup User ** Parameters passed: Result: [{'first_name': 'John', 'last_name': 'Doe', 'phone_number': '555-888-9999'}, {'first_name': 'James', 'last_name': 'Smith', 'phone_number': '444-111-8888'}, {'first_name': 'David', 'last_name': 'Thomas', 'phone_number': '333-222-1234'}] DEBUG:main:[s9VwDrnr5ZHjxG1bP5g-5Q] Responding to transcription DEBUG:main:[s9VwDrnr5ZHjxG1bP5g-5Q] Synthesizing speech for message DEBUG:main:[s9VwDrnr5ZHjxG1bP5g-5Q] Synthesizing speech for message DEBUG:main:[s9VwDrnr5ZHjxG1bP5g-5Q] Sent chunk 0 with size 8000 DEBUG:main:[s9VwDrnr5ZHjxG1bP5g-5Q] Sent chunk 1 with size 8000 DEBUG:main:[s9VwDrnr5ZHjxG1bP5g-5Q] Sending filler audio DEBUG:main:[s9VwDrnr5ZHjxG1bP5g-5Q] No filler audio available for synthesizer DEBUG:main:[s9VwDrnr5ZHjxG1bP5g-5Q] Synthesizing speech for message DEBUG:main:[s9VwDrnr5ZHjxG1bP5g-5Q] Sent chunk 2 with size 8000 DEBUG:main:[s9VwDrnr5ZHjxG1bP5g-5Q] Sent chunk 3 with size 3795 DEBUG:main:[s9VwDrnr5ZHjxG1bP5g-5Q] Message sent: Sure, let me check the company directory for James Smith's phone number. DEBUG:main:[s9VwDrnr5ZHjxG1bP5g-5Q] Synthesizing speech for message DEBUG:main:[s9VwDrnr5ZHjxG1bP5g-5Q] Sent chunk 0 with size 6897 DEBUG:main:[s9VwDrnr5ZHjxG1bP5g-5Q] Message sent: Just a moment. DEBUG:main:[s9VwDrnr5ZHjxG1bP5g-5Q] Sent chunk 0 with size 8000 DEBUG:main:[s9VwDrnr5ZHjxG1bP5g-5Q] Sent chunk 1 with size 8000 DEBUG:main:[s9VwDrnr5ZHjxG1bP5g-5Q] Sent chunk 2 with size 7615 DEBUG:main:[s9VwDrnr5ZHjxG1bP5g-5Q] Message sent: Let me check the company directory for James Smith's phone number. DEBUG:main:[s9VwDrnr5ZHjxG1bP5g-5Q] Sent chunk 0 with size 8000 DEBUG:main:[s9VwDrnr5ZHjxG1bP5g-5Q] Sent chunk 1 with size 8000 DEBUG:main:[s9VwDrnr5ZHjxG1bP5g-5Q] Sent chunk 2 with size 8000 DEBUG:main:[s9VwDrnr5ZHjxG1bP5g-5Q] Sent chunk 3 with size 6302 DEBUG:main:[s9VwDrnr5ZHjxG1bP5g-5Q] Message sent: I'm sorry, but I couldn't find anyone by the name of James Smith in our directory. DEBUG:main:[s9VwDrnr5ZHjxG1bP5g-5Q] Sent chunk 0 with size 8000 DEBUG:main:[s9VwDrnr5ZHjxG1bP5g-5Q] Sent chunk 1 with size 7883 DEBUG:main:[s9VwDrnr5ZHjxG1bP5g-5Q] Message sent: Is there anyone else you would like to speak to? DEBUG:main:[s9VwDrnr5ZHjxG1bP5g-5Q] Media WS: Received event 'stop': {"event":"stop","sequenceNumber":"1256","streamSid":"MZe87d0cc727d8a13f8a718e173865ad71","stop":{"accountSid":"REDACTED","callSid":"CA764f2327e7f6dd93d43e76af8dfa8347"}} DEBUG:main:[s9VwDrnr5ZHjxG1bP5g-5Q] Stopping... DEBUG:main:[s9VwDrnr5ZHjxG1bP5g-5Q] Terminating check_for_idle Task DEBUG:main:[s9VwDrnr5ZHjxG1bP5g-5Q] Tearing down synthesizer DEBUG:main:[s9VwDrnr5ZHjxG1bP5g-5Q] Terminating agent DEBUG:main:[s9VwDrnr5ZHjxG1bP5g-5Q] Terminating output device DEBUG:main:[s9VwDrnr5ZHjxG1bP5g-5Q] Terminating speech transcriber DEBUG:main:[s9VwDrnr5ZHjxG1bP5g-5Q] Terminating transcriptions worker DEBUG:main:[s9VwDrnr5ZHjxG1bP5g-5Q] Terminating final transcriptions worker DEBUG:main:[s9VwDrnr5ZHjxG1bP5g-5Q] Terminating synthesis results worker DEBUG:main:[s9VwDrnr5ZHjxG1bP5g-5Q] Terminating filler audio worker DEBUG:main:[s9VwDrnr5ZHjxG1bP5g-5Q] Terminating actions worker DEBUG:main:[s9VwDrnr5ZHjxG1bP5g-5Q] Successfully terminated DEBUG:main:Phone WS connection closed for chat s9VwDrnr5ZHjxG1bP5g-5Q DEBUG:main:[s9VwDrnr5ZHjxG1bP5g-5Q] Terminating Deepgram transcriber sender INFO: connection closed ^CINFO: Shutting down INFO: Waiting for application shutdown. INFO: Application shutdown complete. INFO: Finished server process [96703] ^C

rjheeta commented 6 months ago

Here's an experiment I tried.

Modify the process method in streaming/action/worker.py to include some print statements about the return value of running an action's run method:

    async def process(self, item: InterruptibleEvent[ActionInput]):
        action_input = item.payload
        action = self.action_factory.create_action(action_input.action_config)
        action.attach_conversation_state_manager(self.conversation_state_manager)
        action_output = await action.run(action_input)

        # **** Start of my additions ****
        print(f'Type of action is: {type(action)}') 
        print(f'Action output: {action_output}') 

        # This should print GetCompanyDirectoryResponse but it prints BaseModel
        print(f'Action output response type: {type(action_output.response)}') 

        # This should print the JSON, but it doesn't
        print(f'Action output response: {action_output.response}') 
        # **** End of my additions ****

        self.produce_interruptible_event_nonblocking(
            ActionResultAgentInput(
                conversation_id=action_input.conversation_id,
                action_input=action_input,
                action_output=action_output,
                vonage_uuid=action_input.vonage_uuid
                if isinstance(action_input, VonagePhoneCallActionInput)
                else None,
                twilio_sid=action_input.twilio_sid
                if isinstance(action_input, TwilioPhoneCallActionInput)
                else None,
                is_quiet=action.quiet,
            )
        )

I've included the relevant parts of the log when this is used

...
DEBUG:__main__:[oyOPrK-HyOSAl665kX40eA] Message sent: Hello, thank you for calling Acme. How can I help you?
INFO:__main__:[oyOPrK-HyOSAl665kX40eA] Ignoring empty transcription
DEBUG:__main__:[oyOPrK-HyOSAl665kX40eA] Got transcription:  Yeah. Can I speak to Mike., confidence: 0.9946289
DEBUG:__main__:[oyOPrK-HyOSAl665kX40eA] Human started speaking
DEBUG:__main__:[oyOPrK-HyOSAl665kX40eA] Responding to transcription
DEBUG:__main__:[oyOPrK-HyOSAl665kX40eA] Sending filler audio
DEBUG:__main__:[oyOPrK-HyOSAl665kX40eA] No filler audio available for synthesizer
DEBUG:__main__:[oyOPrK-HyOSAl665kX40eA] Synthesizing speech for message
****** Lookup User ******
Parameters passed: 
Result: [{'first_name': 'John', 'last_name': 'Doe', 'phone_number': '555-888-9999'}, {'first_name': 'James', 'last_name': 'Smith', 'phone_number': '444-111-8888'}, {'first_name': 'David', 'last_name': 'Thomas', 'phone_number': '333-222-1234'}]
Type of action is: <class 'vocode.streaming.action.get_company_directory.GetCompanyDirectory'>
Action output: action_type='action_get_company_directory' response=BaseModel()
Action output response type: <class 'pydantic.v1.main.BaseModel'>
Action output response:
DEBUG:__main__:[oyOPrK-HyOSAl665kX40eA] Responding to transcription
DEBUG:__main__:[oyOPrK-HyOSAl665kX40eA] Sending filler audio
DEBUG:__main__:[oyOPrK-HyOSAl665kX40eA] No filler audio available for synthesizer
DEBUG:__main__:[oyOPrK-HyOSAl665kX40eA] Synthesizing speech for message
DEBUG:__main__:[oyOPrK-HyOSAl665kX40eA] Sent chunk 0 with size 8000
DEBUG:__main__:[oyOPrK-HyOSAl665kX40eA] Synthesizing speech for message
DEBUG:__main__:[oyOPrK-HyOSAl665kX40eA] Sent chunk 1 with size 6471
DEBUG:__main__:[oyOPrK-HyOSAl665kX40eA] Message sent: Sure, let me find Mike for you.
DEBUG:__main__:[oyOPrK-HyOSAl665kX40eA] Sent chunk 0 with size 8000
DEBUG:__main__:[oyOPrK-HyOSAl665kX40eA] Sent chunk 1 with size 8000
DEBUG:__main__:[oyOPrK-HyOSAl665kX40eA] Sent chunk 2 with size 7537
DEBUG:__main__:[oyOPrK-HyOSAl665kX40eA] Message sent: I'm sorry, but there are multiple employees named Mike.
DEBUG:__main__:[oyOPrK-HyOSAl665kX40eA] Sent chunk 0 with size 8000
DEBUG:__main__:[oyOPrK-HyOSAl665kX40eA] Sent chunk 1 with size 8000
^CINFO:     Shutting down

Or more specifically

****** Lookup User ******
Parameters passed: 
Result: [{'first_name': 'John', 'last_name': 'Doe', 'phone_number': '555-888-9999'}, {'first_name': 'James', 'last_name': 'Smith', 'phone_number': '444-111-8888'}, {'first_name': 'David', 'last_name': 'Thomas', 'phone_number': '333-222-1234'}]
Type of action is: <class 'vocode.streaming.action.get_company_directory.GetCompanyDirectory'>
Action output: action_type='action_get_company_directory' response=BaseModel()
Action output response type: <class 'pydantic.v1.main.BaseModel'>
Action output response:

A few things to point out

  1. For the line printing Action output response:, I am expecting it to return the JSON, but it does not return any output. This explains why the LLM does not get the data
  2. For the line printing Action output response type:, I am not sure why it's printing the type as pydantic.v1.main.BaseModel. I am expecting this to be vocode.streaming.action.get_company_directory.GetCompanyDirectoryResponse shouldn't it?

I am not clear if these help, but just thought I would share.

rjheeta commented 6 months ago

More investigations

I included some debug statements in the create_action_input method in streaming/action/base_action.py and it shows how params type and contents do not persist after creating the ActionInput object.

    def create_action_input(
        self,
        conversation_id: str,
        params: Dict[str, Any],
        user_message_tracker: Optional[asyncio.Event] = None,
    ) -> ActionInput[ParametersType]:
        if "user_message" in params:
            del params["user_message"]

        print(f'** PRE Type of transformed_params: {type(self.parameters_type(**params))}')
        print(f'** PRE Contents of transformed_params: {self.parameters_type(**params)}')

        result = ActionInput(
            action_config=self.action_config,
            conversation_id=conversation_id,
            params=self.parameters_type(**params),
            user_message_tracker=user_message_tracker,
        )

        print(f'** POST params content: {result.params}')
        print(f'** POST params content type: {type(result.params)}')
        print(f'** POST ActionInput full object: {result}')
        return result

Below is the relevant output

Note: I am using a different (simpler) action class here which is just passing a phone number to a function to send an SMS through Twilio. I can share the class if you need. However, the idea I want to draw focus to is how the parameters & parameter types seemingly change after creating the ActionInput object.

** Before create_action_input. params: {'recipient_phone': '+15555551234', 'user_message': 'Alright, I will page them for you.'}
** PRE Type of transformed_params: <class 'vocode.streaming.action.twilio_send_sms.TwilioSendSmsParameters'>
** PRE Contents of transformed_params: recipient_phone='+15555551234'
** POST params content: 
** POST params content type: <class 'pydantic.v1.main.BaseModel'>
** POST ActionInput full object: action_config=TwilioSendSmsActionConfig() conversation_id='xsM_yF4O5DpiSchPzyAtCw' params=BaseModel() user_message_tracker=<asyncio.locks.Event object at 0x17ef06e90 [unset]>

Why is it (correctly) of type vocode.streaming.action.twilio_send_sms.TwilioSendSmsParameters before the call, and then changes to pydantic.v1.main.BaseModel after the call?

The only way I could get the parameters to persist was to modify the ActionInput class (see streaming/modles/actions.py) to include an __init__() method where I explicitly reset the params after super() is called. See here:

class ActionInput(BaseModel, Generic[ParametersType]):
    action_config: ActionConfig
    conversation_id: str
    params: ParametersType
    user_message_tracker: Optional[asyncio.Event] = None

    def __init__(self, **data):
        params_data = data.get('params')
        super().__init__(**data) # <-- This is what's resetting the params...
        self.params = params_data

    class Config:
        arbitrary_types_allowed = True

(Note: I'm not suggesting this is correct; just sharing data)

rjheeta commented 6 months ago

Closing issue. Sorry, this was my bad. I was using an out of date example. The issue was to use pydantic.v1 not pydantic. So the following fixed it.

from pydantic.v1 import BaseModel, Field