Try to generate images using the Gemini API

ogallagher commented 3 months ago

There is some uncertainty as to whether the Gemini API will be able to generate images on its own, so we need a proof of concept for this.

[ ] ~~Raster image support (ex. JPG, PNG)~~
[ ] ~~Vector image support~~

On failure of above decide on next steps

Craiyon web client. No API.
OpenAI DALL-E. Has API.
Stable Diffusion. Has API.
Midjourney [no API]
[x] Research how general Google AI chat generates images.
[x] Check Google imagen api intro tutorial.
- Available through Vertex API in Google Cloud.
- Requires billing enabled. Pricing = 0.02 USD/img
keep trying w Gemini. I didn't yet try structured prompts instead of the chat interface. I also didn't try having Gemini describe a scene geometrically in detail, and then pass that description to the SVG request.
ask the Gemini API to describe the image it wants, search for and get candidate images from trusted sites, then send them back to the Gemini API to let it pick
skip image generation

ogallagher commented 3 months ago

So far, it does seem that there's no way to get gemini to generate a raster image; the furthest I got was an empty 1px square base-64 PNG data string and a description of how to do it with external image generator models.

ogallagher commented 3 months ago

I'm able to generate basic geometry in an SVG, but more complex shapes and real-world entities it so far cannot draw.

For example, when I asked it to, instead of a triangle, draw a fish, it gave the polygon the attribute id="fish", but did not change the geometry to resemble a fish. Other times for similar prompts, it changed the geometry but the shape looked nothing like a fish.

ogallagher commented 3 months ago

In conclusion, I do not believe we can use gemini to generate images, to confirm your doubt @hoanghm.

hoanghm commented 3 months ago

@ogallagher We can instead try to ask the Gemini API to describe the image it wants, search for and get a bunch of images from trusted sites, then send them back to the Gemini API to let it pick.

hoanghm / Proact

Try to generate images using the Gemini API #5

On failure of above decide on next steps