Support autonomous vision input for Gemini✨

Add SimpleCamera prefab to the scene and set it as a member of script (in this example code, simpleCamera).

Include system instruction like below:

## Using Vision

If you need an image to process a user's request, you can obtain it using the following methods:

- camera
- screenshot

If an image is needed to process the request, add an instruction like [vision:camera] to your response to request an image from the user.

By adding this instruction, the user will provide an image in their next utterance. No comments about the image itself are necessary.

Example:

user: Look! This is the picture I painted.
assistant: [vision:camera] Let me take a look.

And, implement CaptureImage.

private async UniTask<byte[]> CaptureImageAsync(string source)
{
    if (simpleCamera != null)
    {
        try
        {
            return await simpleCamera.CaptureImageAsync();
        }
        catch (Exception ex)
        {
            Debug.LogError($"Error at CaptureImageAsync: {ex.Message}\n{ex.StackTrace}");
        }
    }

    return null;
}

gameObject.GetComponent<GeminiService>().CaptureImage = CaptureImageAsync;

uezo / ChatdollKit

Support autonomous vision input for Gemini✨ #302

GoogleForJapan