Closed reconsumeralization closed 10 months ago
53a52f465b
)[!TIP] I'll email you at reconsumeralization@gmail.com when I complete this pull request!
Here are the GitHub Actions logs prior to making any changes:
fb2064a
Checking README.md for syntax errors... ✅ README.md has no syntax errors!
1/1 ✓Checking README.md for syntax errors... ✅ README.md has no syntax errors!
Sandbox passed on the latest main
, so sandbox checks will be enabled for this issue.
I found the following snippets in your repository. I will now analyze these snippets and come up with a plan.
src/gemini_client.py
✓ https://github.com/reconsumeralization/AutoGem/commit/a983c3aa4c0849f7a1bb9ee52b08364c1f5c4a03 Edit
Create src/gemini_client.py with contents:
• Create a new Python file named `gemini_client.py` in a new directory named `src`. This file will contain the implementation of the `GeminiClient` class, which will serve as the interface for interacting with the Google Gemini Pro Models API and the Google Gemini Vision Pro Models API.
• At the top of `gemini_client.py`, import necessary modules for HTTP requests, Google Cloud authentication, and any other dependencies required for API interaction and data processing. This includes `google.cloud.gemini_pro_models`, `google.cloud.gemini_vision_pro_models`, and `base64` for image encoding.
• Implement the `GeminiClient` class with an `__init__` method that initializes the client with necessary credentials for Google Cloud authentication. This method should accept an API key as an argument and store it for use in making authenticated requests.
• Implement methods `predict_with_gemini_pro_models` and `predict_with_gemini_vision_pro_models` within the `GeminiClient` class. These methods should construct and send requests to the respective APIs, using the code samples provided in the issue description as a guide. Ensure that image files are correctly encoded as base64 when necessary.
• Add methods for parsing the API responses and extracting useful information from the prediction results. This includes interpreting the `category`, `confidence`, and `bounding_box` fields of each label in the prediction results.
• Include error handling to manage potential issues with API requests, such as rate limits or authentication errors.
src/gemini_client.py
✓ Edit
Check src/gemini_client.py with contents:
Ran GitHub Actions for a983c3aa4c0849f7a1bb9ee52b08364c1f5c4a03:
src/utils.py
✓ https://github.com/reconsumeralization/AutoGem/commit/3f5bcc12d54c60b071439c37ca4f509d24477c41 Edit
Create src/utils.py with contents:
• Create a new Python file named `utils.py` in the `src` directory. This file will contain utility functions to support the operations of the `GeminiClient`.
• Implement a function `encode_image_to_base64` that takes the path to an image file as input and returns the base64 encoded string of the image. This function will be used by the `GeminiClient` to prepare images for prediction requests to the Gemini Vision Pro Models API.
• Implement any additional utility functions needed for data processing or API interaction, such as functions to validate input data or parse complex response structures.
src/utils.py
✓ Edit
Check src/utils.py with contents:
Ran GitHub Actions for 3f5bcc12d54c60b071439c37ca4f509d24477c41:
README.md
✓ https://github.com/reconsumeralization/AutoGem/commit/177d73e830d9f859ad278870ac20d3fd160ac923 Edit
Modify README.md with contents:
• Update the README.md file to include documentation on how to use the new `GeminiClient` class. Provide examples of initializing the client, making prediction requests, and interpreting the results.
• Include instructions for setting up the necessary environment, such as installing dependencies and configuring Google Cloud authentication.
• Mention any limitations or known issues with the current implementation, such as unsupported features or API restrictions.
--- +++ @@ -10,4 +10,59 @@ python -m unittest discover tests ``` +### Using the `GeminiClient` Class + +The `GeminiClient` class provides an interface to the Google Gemini Pro Models API and Google Gemini Vision Pro Models API. Here's how you can use it: + +#### Setup + +Before using the `GeminiClient`, ensure you have installed the necessary dependencies: + +```shell +pip install google-cloud-gemini-pro-models google-cloud-gemini-vision-pro-models +``` + +You must also configure your Google Cloud authentication by setting the `GOOGLE_APPLICATION_CREDENTIALS` environment variable to the path of your service account key file: + +```shell +export GOOGLE_APPLICATION_CREDENTIALS="/path/to/your/service-account-file.json" +``` + +#### Initializing the Client + +To initialize the `GeminiClient`, provide your Google Cloud API key: + +```python +from src.gemini_client import GeminiClient + +client = GeminiClient(api_key='YOUR_API_KEY') +``` + +#### Making Prediction Requests + +To make prediction requests, use the `predict_with_gemini_pro_models` or `predict_with_gemini_vision_pro_models` methods. Provide the model name and the path to the image you wish to classify: + +```python +# Predict with Gemini Pro Models +results = client.predict_with_gemini_pro_models('model_name', 'path/to/image.jpg') + +# Predict with Gemini Vision Pro Models +results = client.predict_with_gemini_vision_pro_models('model_name', 'path/to/image.jpg') +``` + +#### Interpreting the Results + +The methods return a list of dictionaries, each representing a prediction result. Here's how to interpret these results: + +```python +for result in results: + print(f"Category: {result['category']}, Confidence: {result['confidence']}, Bounding Box: {result['bounding_box']}") +``` + +### Known Issues and Limitations + +- The current implementation does not support streaming predictions. +- Only prediction requests with single instances (images) are supported; batch predictions are not yet implemented. +- The API may impose limits on the number of requests per minute or other usage restrictions. + Adjust the command based on the project's programming language and chosen testing framework.
README.md
✓ Edit
Check README.md with contents:
Ran GitHub Actions for 177d73e830d9f859ad278870ac20d3fd160ac923:
I have finished reviewing the code for completeness. I did not find errors for sweep/research_the_google_gemini_pro_models_ap
.
💡 To recreate the pull request edit the issue title or description. To tweak the pull request, leave a comment on the pull request.Something wrong? Let us know.
This is an automated message generated by Sweep AI.
Details
Sweep: Research the google Gemini pro models api and google gemini vision pro models api and create a client to work with and interprit our api clls to the model correctly. Google Gemini Pro Models API
Overview
The Google Gemini Pro Models API is a cloud-based API that provides access to a variety of pre-trained Gemini models, including models for tasks such as image classification, object detection, and text classification. The API is designed to be easy to use, even for developers who don't have experience with deep learning.
Getting Started
To get started with the Google Gemini Pro Models API, you'll need to:
GeminiProModelsServiceClient
.Code Sample
The following code sample shows you how to use the Google Gemini Pro Models API to classify an image:
Google Gemini Vision Pro Models API
Overview
The Google Gemini Vision Pro Models API is a cloud-based API that provides access to a variety of pre-trained Gemini models for computer vision tasks, such as image classification, object detection, and image segmentation. The API is designed to be easy to use, even for developers who don't have experience with deep learning.
Getting Started
To get started with the Google Gemini Vision Pro Models API, you'll need to:
GeminiVisionProModelsServiceClient
.Code Sample
The following code sample shows you how to use the Google Gemini Vision Pro Models API to classify an image:
Interpreting API Calls
The output of the Google Gemini Pro Models API and Google Gemini Vision Pro Models API is a list of
Prediction
objects. EachPrediction
object contains a list ofLabel
objects. EachLabel
object contains the following information:category
: The category of the label.confidence
: The confidence of the prediction.bounding_box
: The bounding box of the label (if applicable).You can use this information to interpret the results of your API call. For example, if you are using the API to classify an image, you can use the
category
field to determine the class of the image. You can use theconfidence
field to determine how confident the API is in its prediction. And you can use thebounding_box
field to locate the object in the image.Conclusion
The Google Gemini Pro Models API and Google Gemini Vision Pro Models API are powerful tools for developers who want to use deep learning models in their applications. The APIs are easy to use, even for developers who don't have experience with deep learning. from future import annotations
import base64 import os import pdb import random import re import time from io import BytesIO from typing import Any, Dict, List, Mapping, Union
import google.generativeai as genai import httpx import requests from google.ai.generativelanguage import Content, Part from google.api_core.exceptions import InternalServerError from google.generativeai import ChatSession from openai import OpenAI, _exceptions, resources from openai._qs import Querystring from openai._types import NOT_GIVEN, NotGiven, Omit, ProxiesTypes, RequestOptions, Timeout, Transport from openai.types.chat import ChatCompletion from openai.types.chat.chat_completion import ChatCompletionMessage, Choice from openai.types.completionusage import CompletionUsage from PIL import Image from proto.marshal.collections.repeated import RepeatedComposite from pydash import max from typing_extensions import Self, override
from autogen.agentchat.contrib.img_utils import _to_pil, get_image_data
from autogen.token_count_utils import count_token
class GeminiClient: """ summary _extendedsummary TODO: this Gemini implementation does not support the following features yet:
multiple responses at the same time (Gemini) """
def init(self, **kwargs): self.api_key = kwargs.get("api_key", None) if self.api_key is None: self.api_key = os.getenv("GOOGLE_API_KEY")
def call(self, params: Dict) -> ChatCompletion: model_name = params.get("model", "gemini-pro") params.get("api_type", "google") # not used messages = params.get("messages", []) stream = params.get("stream", False) n_response = params.get("n", 1) params.get("temperature", 0.5) params.get("top_p", 1.0) params.get("max_tokens", 1024)
TODO: handle these parameters in GenerationConfig
def oai_content_to_gemini_content(content: Union[str, List]) -> List: """Convert content from OAI format to Gemini format""" rst = [] if isinstance(content, str): rst.append(Part(text=content)) return rst
def concat_parts(parts: List[Part]) -> List: """Concatenate parts with the same type. If two adjacent parts both have the "text" attribute, then it will be joined into one part. """ if not parts: return []
def oai_messages_to_gemini_messages(messages: list[Dict[str, Any]]) -> list[dict[str, Any]]: """Convert messages from OAI format to Gemini format. Make sure the "user" role and "model" role are interleaved. Also, make sure the last item is from the "user" role. """ prev_role = None rst = [] curr_parts = [] for i, message in enumerate(messages): parts = oai_content_to_gemini_content(message["content"]) role = "user" if message["role"] in ["user", "system"] else "model"
def count_gemini_tokens(ans: Union[str, Dict, List], model_name: str) -> int:
ans is OAI format in oai_messages
def _to_pil(data: str) -> Image.Image: """ Converts a base64 encoded image data string to a PIL Image object. This function first decodes the base64 encoded string to bytes, then creates a BytesIO object from the bytes, and finally creates and returns a PIL Image object from the BytesIO object. Parameters: data (str): The base64 encoded image data string. Returns: Image.Image: The PIL Image object created from the input data. """ return Image.open(BytesIO(base64.b64decode(data)))
def get_image_data(image_file: str, use_b64=True) -> bytes: if image_file.startswith("http://") or image_file.startswith("https://"): response = requests.get(image_file) content = response.content elif re.match(r"data:image/(?:png|jpeg);base64,", image_file): return re.sub(r"data:image/(?:png|jpeg);base64,", "", image_file) else: image = Image.open(image_file).convert("RGB") buffered = BytesIO() image.save(buffered, format="PNG") content = buffered.getvalue()
Checklist
- [X] Create `src/gemini_client.py` ✓ https://github.com/reconsumeralization/AutoGem/commit/a983c3aa4c0849f7a1bb9ee52b08364c1f5c4a03 [Edit](https://github.com/reconsumeralization/AutoGem/edit/sweep/research_the_google_gemini_pro_models_ap/src/gemini_client.py) - [X] Running GitHub Actions for `src/gemini_client.py` ✓ [Edit](https://github.com/reconsumeralization/AutoGem/edit/sweep/research_the_google_gemini_pro_models_ap/src/gemini_client.py) - [X] Create `src/utils.py` ✓ https://github.com/reconsumeralization/AutoGem/commit/3f5bcc12d54c60b071439c37ca4f509d24477c41 [Edit](https://github.com/reconsumeralization/AutoGem/edit/sweep/research_the_google_gemini_pro_models_ap/src/utils.py) - [X] Running GitHub Actions for `src/utils.py` ✓ [Edit](https://github.com/reconsumeralization/AutoGem/edit/sweep/research_the_google_gemini_pro_models_ap/src/utils.py) - [X] Modify `README.md` ✓ https://github.com/reconsumeralization/AutoGem/commit/177d73e830d9f859ad278870ac20d3fd160ac923 [Edit](https://github.com/reconsumeralization/AutoGem/edit/sweep/research_the_google_gemini_pro_models_ap/README.md) - [X] Running GitHub Actions for `README.md` ✓ [Edit](https://github.com/reconsumeralization/AutoGem/edit/sweep/research_the_google_gemini_pro_models_ap/README.md)