hoanghm / Proact

2 stars 0 forks source link

Try to generate images using the Gemini API #5

Closed ogallagher closed 3 months ago

ogallagher commented 3 months ago

There is some uncertainty as to whether the Gemini API will be able to generate images on its own, so we need a proof of concept for this.

On failure of above decide on next steps

ogallagher commented 3 months ago

So far, it does seem that there's no way to get gemini to generate a raster image; the furthest I got was an empty 1px square base-64 PNG data string and a description of how to do it with external image generator models.

ogallagher commented 3 months ago

I'm able to generate basic geometry in an SVG, but more complex shapes and real-world entities it so far cannot draw.

red-triangle_rotate.svg

For example, when I asked it to, instead of a triangle, draw a fish, it gave the polygon the attribute id="fish", but did not change the geometry to resemble a fish. Other times for similar prompts, it changed the geometry but the shape looked nothing like a fish.

red-fish_half-cylinder red-fish_circles red-fish_round-fruit-multi-polygon red-fish_state-boundary red-fish_many-row red-fish_koi-sliced red-fish_vertical-beads red-fish_horizontal-beads
ogallagher commented 3 months ago

In conclusion, I do not believe we can use gemini to generate images, to confirm your doubt @hoanghm.

hoanghm commented 3 months ago

@ogallagher We can instead try to ask the Gemini API to describe the image it wants, search for and get a bunch of images from trusted sites, then send them back to the Gemini API to let it pick.