[Bug]: InvalidArgument on get_base_model for GeminiMultiModal

hemanth commented 6 months ago

Bug Description

GeminiMultiModal(model_name="models/gemini-pro-vision") fails even though the model name is valid?

GEMINI_MM_MODELS = (
    "models/gemini-pro-vision",
    "models/gemini-ultra-vision",
)

Version

0.10.20

Steps to Reproduce

%pip install llama-index-multi-modal-llms-gemini
%pip install llama-index-vector-stores-qdrant
%pip install llama-index-embeddings-gemini
%pip install llama-index-llms-gemini
%pip install llama-index 'google-generativeai>=0.3.0' matplotlib qdrant_clien

from llama_index.multi_modal_llms.gemini import GeminiMultiModal
gemini_pro = GeminiMultiModal(model_name="models/gemini-pro-vision")

Relevant Logs/Tracbacks

---------------------------------------------------------------------------
InvalidArgument                           Traceback (most recent call last)
Cell In[9], [line 12](vscode-notebook-cell:?execution_count=9&line=12)
      [8](vscode-notebook-cell:?execution_count=9&line=8) display_img(image_urls[0])
     [10](vscode-notebook-cell:?execution_count=9&line=10) image_documents = load_image_urls(image_urls)
---> [12](vscode-notebook-cell:?execution_count=9&line=12) gemini_pro = GeminiMultiModal()

File [~/.python/current/lib/python3.10/site-packages/llama_index/multi_modal_llms/gemini/base.py:105](https://vscode-remote+codespaces-002bhumble-002drotary-002dphone-002d4g5wg9rx37q7p.vscode-resource.vscode-cdn.net/workspaces/notebooks/notebooks/~/.python/current/lib/python3.10/site-packages/llama_index/multi_modal_llms/gemini/base.py:105), in GeminiMultiModal.__init__(self, api_key, model_name, temperature, max_tokens, generation_config, safety_settings, api_base, transport, callback_manager, **generate_kwargs)
     [94](https://vscode-remote+codespaces-002bhumble-002drotary-002dphone-002d4g5wg9rx37q7p.vscode-resource.vscode-cdn.net/workspaces/notebooks/notebooks/~/.python/current/lib/python3.10/site-packages/llama_index/multi_modal_llms/gemini/base.py:94)     raise ValueError(
     [95](https://vscode-remote+codespaces-002bhumble-002drotary-002dphone-002d4g5wg9rx37q7p.vscode-resource.vscode-cdn.net/workspaces/notebooks/notebooks/~/.python/current/lib/python3.10/site-packages/llama_index/multi_modal_llms/gemini/base.py:95)         f"Invalid model {model_name}. "
     [96](https://vscode-remote+codespaces-002bhumble-002drotary-002dphone-002d4g5wg9rx37q7p.vscode-resource.vscode-cdn.net/workspaces/notebooks/notebooks/~/.python/current/lib/python3.10/site-packages/llama_index/multi_modal_llms/gemini/base.py:96)         f"Available models are: {GEMINI_MM_MODELS}"
     [97](https://vscode-remote+codespaces-002bhumble-002drotary-002dphone-002d4g5wg9rx37q7p.vscode-resource.vscode-cdn.net/workspaces/notebooks/notebooks/~/.python/current/lib/python3.10/site-packages/llama_index/multi_modal_llms/gemini/base.py:97)     )
     [99](https://vscode-remote+codespaces-002bhumble-002drotary-002dphone-002d4g5wg9rx37q7p.vscode-resource.vscode-cdn.net/workspaces/notebooks/notebooks/~/.python/current/lib/python3.10/site-packages/llama_index/multi_modal_llms/gemini/base.py:99) self._model = genai.GenerativeModel(
    [100](https://vscode-remote+codespaces-002bhumble-002drotary-002dphone-002d4g5wg9rx37q7p.vscode-resource.vscode-cdn.net/workspaces/notebooks/notebooks/~/.python/current/lib/python3.10/site-packages/llama_index/multi_modal_llms/gemini/base.py:100)     model_name=model_name,
    [101](https://vscode-remote+codespaces-002bhumble-002drotary-002dphone-002d4g5wg9rx37q7p.vscode-resource.vscode-cdn.net/workspaces/notebooks/notebooks/~/.python/current/lib/python3.10/site-packages/llama_index/multi_modal_llms/gemini/base.py:101)     generation_config=final_gen_config,
    [102](https://vscode-remote+codespaces-002bhumble-002drotary-002dphone-002d4g5wg9rx37q7p.vscode-resource.vscode-cdn.net/workspaces/notebooks/notebooks/~/.python/current/lib/python3.10/site-packages/llama_index/multi_modal_llms/gemini/base.py:102)     safety_settings=safety_settings,
    [103](https://vscode-remote+codespaces-002bhumble-002drotary-002dphone-002d4g5wg9rx37q7p.vscode-resource.vscode-cdn.net/workspaces/notebooks/notebooks/~/.python/current/lib/python3.10/site-packages/llama_index/multi_modal_llms/gemini/base.py:103) )
--> [105](https://vscode-remote+codespaces-002bhumble-002drotary-002dphone-002d4g5wg9rx37q7p.vscode-resource.vscode-cdn.net/workspaces/notebooks/notebooks/~/.python/current/lib/python3.10/site-packages/llama_index/multi_modal_llms/gemini/base.py:105) self._model_meta = genai.get_model(model_name)
    [107](https://vscode-remote+codespaces-002bhumble-002drotary-002dphone-002d4g5wg9rx37q7p.vscode-resource.vscode-cdn.net/workspaces/notebooks/notebooks/~/.python/current/lib/python3.10/site-packages/llama_index/multi_modal_llms/gemini/base.py:107) supported_methods = self._model_meta.supported_generation_methods
    [108](https://vscode-remote+codespaces-002bhumble-002drotary-002dphone-002d4g5wg9rx37q7p.vscode-resource.vscode-cdn.net/workspaces/notebooks/notebooks/~/.python/current/lib/python3.10/site-packages/llama_index/multi_modal_llms/gemini/base.py:108) if "generateContent" not in supported_methods:

File [~/.python/current/lib/python3.10/site-packages/google/generativeai/models.py:49](https://vscode-remote+codespaces-002bhumble-002drotary-002dphone-002d4g5wg9rx37q7p.vscode-resource.vscode-cdn.net/workspaces/notebooks/notebooks/~/.python/current/lib/python3.10/site-packages/google/generativeai/models.py:49), in get_model(name, client)
     [47](https://vscode-remote+codespaces-002bhumble-002drotary-002dphone-002d4g5wg9rx37q7p.vscode-resource.vscode-cdn.net/workspaces/notebooks/notebooks/~/.python/current/lib/python3.10/site-packages/google/generativeai/models.py:47) name = model_types.make_model_name(name)
     [48](https://vscode-remote+codespaces-002bhumble-002drotary-002dphone-002d4g5wg9rx37q7p.vscode-resource.vscode-cdn.net/workspaces/notebooks/notebooks/~/.python/current/lib/python3.10/site-packages/google/generativeai/models.py:48) if name.startswith("models/"):
---> [49](https://vscode-remote+codespaces-002bhumble-002drotary-002dphone-002d4g5wg9rx37q7p.vscode-resource.vscode-cdn.net/workspaces/notebooks/notebooks/~/.python/current/lib/python3.10/site-packages/google/generativeai/models.py:49)     return get_base_model(name, client=client)
...
metadata {
  key: "service"
  value: "generativelanguage.googleapis.com"
}
]

dosubot[bot] commented 6 months ago

Hey @hemanth, great to see you back here! Hope you're ready to tackle some more LlamaIndex mysteries together 🕵️‍♂️

Thank you for providing detailed information about the issue you're facing. Based on the traceback you've provided, it seems like the InvalidArgument error is being raised when the get_base_model function from the google.generativeai.models module is called with the model name "models/gemini-pro-vision".

The GeminiMultiModal class in the LlamaIndex codebase is designed to work with Google's Gemini multi-modal models. It seems like the model name "models/gemini-pro-vision" is not recognized as a valid model name by the GeminiMultiModal class, even though it is included in the GEMINI_MM_MODELS tuple.

GEMINI_MM_MODELS = (
    "models/gemini-pro-vision",
    "models/gemini-ultra-vision",
)

This could be due to a discrepancy between the model names recognized by the GeminiMultiModal class and the model names that the get_base_model function in the google.generativeai.models module is expecting.

I found a similar issue in the LlamaIndex repository: specifying the model name isn't working in the latest version. This issue was closed by a pull request: Fix query runner by passing llm_predictor/prompt_helper to the query properly . However, it seems like this issue was related to the way the model name was passed to the query runner, not the get_base_model function.

Unfortunately, without more information about the get_base_model function and why it's raising an InvalidArgument error, it's difficult to provide a definitive solution to your issue. It might be helpful to check the documentation or source code of the google.generativeai.models module to understand what model names it expects and how it handles invalid model names. If the issue persists, it might be a bug in the LlamaIndex codebase or the google.generativeai.models module, and you might want to consider opening an issue in the appropriate repository.

I hope this information helps! If you have any more questions or need further clarification, feel free to ask.

Sources

#### About Dosu

This response is meant to be useful and save you time. It is **not meant to be a precise solution**, but rather a starting point for your own research. Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant. If you want to continue the conversation, start your reply with @dosu-bot.

hemanth commented 6 months ago

Well, same with just invoking GeminiMultiModal() without a model_name, it should basically pick up GEMINI_MM_MODELS[0]

model_name: str = Field(
        default=GEMINI_MM_MODELS[0], description="The Gemini model to use."
    )

hemanth commented 6 months ago

>>> import google.generativeai as genai

>>> for m in genai.list_models():
...   print(m.name)
... 
models/chat-bison-001
models/text-bison-001
models/embedding-gecko-001
models/gemini-1.0-pro
models/gemini-1.0-pro-001
models/gemini-1.0-pro-latest
models/gemini-1.0-pro-vision-latest
models/gemini-pro
models/gemini-pro-vision
models/embedding-001
models/aqa

hemanth commented 6 months ago

@hatianzhang does this work for you?

chilicrabcakes commented 6 months ago

@hemanth could it be that the error is not because of the model name? AFAIK llama-index passes three arguments to the genai.GenerativeModel class, model_name, generation_config, and safety_settings.

self._model = genai.GenerativeModel(
            model_name=model_name,
            generation_config=final_gen_config,
            safety_settings=safety_settings,
        )

This is from https://github.com/run-llama/llama_index/blob/main/llama-index-integrations/multi_modal_llms/llama-index-multi-modal-llms-gemini/llama_index/multi_modal_llms/gemini/base.py

It may also be that the types for generation_config, and safety_settings is not supported anymore by google. For example, the generation_config type can be a union of three things:

GenerationConfigType = Union[glm.GenerationConfig, GenerationConfigDict, GenerationConfig]

And as far as I understand it, the GenerationConfigDict, which inherits from TypedDict, MUST have four keys: candidate_count: int stop_sequences: Iterable[str] max_output_tokens: int temperature: float

class GenerationConfigDict(TypedDict):
    # TODO(markdaoust): Python 3.11+ use `NotRequired`, ref: https://peps.python.org/pep-0655/
    candidate_count: int
    stop_sequences: Iterable[str]
    max_output_tokens: int
    temperature: float

In llama-index, if a generation_config is not defined when calling GeminiMultiModal(), a base_gen_config dict is created which contains only temperature as a key/value pair, not the other values.

Tl;dr - can you try defining a generation_config and a safety_settings when calling the GeminiMultiModal class?

Potential quick fixes - Maybe we remove the base_gen_config entirely? Google's generative ai library is able to support if generative_config and satefy_settings are None.

hemanth commented 6 months ago

error is not because of the model name

https://github.com/run-llama/llama_index/issues/12094#issuecomment-2008149304

hemanth commented 6 months ago

So, I created a fresh evn and it seems to be working I suspect it was t do with the API_KEY.

run-llama / llama_index