run-llama / llama_index

LlamaIndex is a data framework for your LLM applications
https://docs.llamaindex.ai
MIT License
36.74k stars 5.27k forks source link

[Bug]: Vertex AI safety settings kwargs error #16265

Open arthurbrenno opened 1 month ago

arthurbrenno commented 1 month ago

Bug Description

Imagine an instantiated Vertex ai model (LLM). I'm trying to access the "achat" interface to chat with the gemini model (gemini-1.5-flash-002)

And I'm getting the following error:

ERROR: Unknown field for GenerationConfig: safety_settings

 /workspaces/CORTEX/.venv/lib/python3.10/site-packages/llama_index/llms/vertex/base.py:384 in     │
│ achat                                                                                            │
│                                                                                                  │
│   381 │   │   │   │   │   "examples are not supported in chat generation pass them as a          │
│       constructor parameter"                                                                     │
│   382 │   │   │   │   )                                                                          │
│   383 │   │   │   )                                                                              │
│ ❱ 384 │   │   generation = await acompletion_with_retry(                                         │
│   385 │   │   │   client=self._chat_client,                                                      │
│   386 │   │   │   prompt=question,                                                               │
│   387 │   │   │   chat=True,                                                                     │
│                                                                                                  │
│ /workspaces/CORTEX/.venv/lib/python3.10/site-packages/llama_index/llms/vertex/utils.py:148 in    │
│ acompletion_with_retry                                                                           │
│                                                                                                  │
│   145 │   │   else:                                                                              │
│   146 │   │   │   return await client.predict_async(prompt, **kwargs)                            │
│   147 │                                                                                          │
│ ❱ 148 │   return await _completion_with_retry(**kwargs)                                          │
│   149                                                                                            │
│   150                                                                                            │
│   151 def init_vertexai(  

And I think it might be a llamaindex problem. Maybe a recent google update that messed up some llamaindex config... idk...

Version

llama-index==0.11.14 || llama-index-llms-vertex==0.3.6 || google-ai-generativelanguage==0.6.4 || google-generativeai==0.5.4

Steps to Reproduce

llm = Vertex(...)
chat = await llm.achat(...)

Error

Relevant Logs/Tracbacks

No response

dosubot[bot] commented 1 month ago

To resolve the "Unknown field for GenerationConfig: safety_settings" error, ensure that you are using the correct version of the Vertex AI library that supports the safety_settings parameter. The error suggests that the safety_settings field might not be recognized, possibly due to an incorrect or outdated version of the Vertex AI library.

Here is the relevant part of the code that deals with the safety_settings:

def create_gemini_client(
    model: str, safety_settings: Optional[SafetySettingsType]
) -> Any:
    from vertexai.preview.generative_models import GenerativeModel

    return GenerativeModel(model_name=model, safety_settings=safety_settings)

Ensure that the google-generativeai library is installed and updated to at least version 0.3.0 as specified in the code. If the safety_settings parameter is still not recognized, it might be necessary to consult the latest Vertex AI and google-generativeai documentation for any changes or deprecations regarding this parameter [1][2][3].

To continue talking to Dosu, mention @dosu.

kenleejr commented 1 month ago

I've confirmed this too and there looks to be a bug in the code. In the constructor for Vertex LLM, there is this code which constructs model_kwargs:

@property
    def _model_kwargs(self) -> Dict[str, Any]:
        base_kwargs = {
            "temperature": self.temperature,
            "max_output_tokens": self.max_tokens,
            "safety_settings": self._safety_settings,
        }
        return {
            **base_kwargs,
            **self.additional_kwargs,
        }

In both achat and acomplete, likely elsewhere these kwargs are being passed into acompletion_with_retry:

@llm_chat_callback()
    async def achat(
        self, messages: Sequence[ChatMessage], **kwargs: Any
    ) -> ChatResponse:
        merged_messages = (
            merge_neighboring_same_role_messages(messages)
            if self._is_gemini
            else messages
        )
        question = _parse_message(merged_messages[-1], self._is_gemini)
        chat_history = _parse_chat_history(merged_messages[:-1], self._is_gemini)
        chat_params = {**chat_history}
        kwargs = kwargs if kwargs else {}
        params = {**self._model_kwargs, **kwargs}
        if self.iscode and "candidate_count" in params:
            raise (ValueError("candidate_count is not supported by the codey model's"))
        if self.examples and "examples" not in params:
            chat_params["examples"] = _parse_examples(self.examples)
        elif "examples" in params:
            raise (
                ValueError(
                    "examples are not supported in chat generation pass them as a constructor parameter"
                )
            )
        generation = await acompletion_with_retry(
            client=self._chat_client,
            prompt=question,
            chat=True,
            is_gemini=self._is_gemini,
            params=chat_params,
            max_retries=self.max_retries,
            **params,
        )
        ##this is due to a bug in vertex AI we have to await twice
        if self.iscode:
            generation = await generation

        content, tool_calls = self._get_content_and_tool_calls(generation)
        return ChatResponse(
            message=ChatMessage(
                role=MessageRole.ASSISTANT,
                content=content,
                additional_kwargs={"tool_calls": tool_calls},
            ),
            raw=generation.__dict__,
        )

    @llm_completion_callback()
    async def acomplete(
        self, prompt: str, formatted: bool = False, **kwargs: Any
    ) -> CompletionResponse:
        kwargs = kwargs if kwargs else {}
        params = {**self._model_kwargs, **kwargs}
        if self.iscode and "candidate_count" in params:
            raise (ValueError("candidate_count is not supported by the codey model's"))
        completion = await acompletion_with_retry(
            client=self._client,
            prompt=prompt,
            max_retries=self.max_retries,
            is_gemini=self._is_gemini,
            **params,
        )
        return CompletionResponse(text=completion.text)

Looking at acompletion_with_retry we see that the safety_settings are not being assigned to its own argument but are being put in generation_config which is not valid:

     async def _completion_with_retry(**kwargs: Any) -> Any:
        if is_gemini:
            history = params["message_history"] if "message_history" in params else []
            generation = client.start_chat(history=history)
            kwargs = dict(kwargs)
            tools = kwargs.pop("tools", None) if "tools" in kwargs else []
            tools = to_gemini_tools(tools) if tools else []
            generation_config = kwargs if kwargs else {}
            return await generation.send_message_async(
                prompt,
                tools=tools,
                generation_config=generation_config,
            )
arthurbrenno commented 1 month ago

You're right. I've seen that earlier too.