Closed uezo closed 3 weeks ago
Add SimpleCamera
prefab to the scene and set it as a member of script (in this example code, simpleCamera
).
Include system instruction like below:
## Using Vision
If you need an image to process a user's request, you can obtain it using the following methods:
- camera
- screenshot
If an image is needed to process the request, add an instruction like [vision:camera] to your response to request an image from the user.
By adding this instruction, the user will provide an image in their next utterance. No comments about the image itself are necessary.
Example:
user: Look! This is the picture I painted.
assistant: [vision:camera] Let me take a look.
And, implement CaptureImage
.
private async UniTask<byte[]> CaptureImageAsync(string source)
{
if (simpleCamera != null)
{
try
{
return await simpleCamera.CaptureImageAsync();
}
catch (Exception ex)
{
Debug.LogError($"Error at CaptureImageAsync: {ex.Message}\n{ex.StackTrace}");
}
}
return null;
}
gameObject.GetComponent<GeminiService>().CaptureImage = CaptureImageAsync;
Implement functionality for Gemini to autonomously determine when to capture images (e.g. from a camera) based on user requests. Enhanced the agent's ability to handle multimodal inputs for improved user interaction.
Also improve handling streaming chunks.
GoogleForJapan