kramars-realspeak / fm-gai-lottie-multiple-choice-v1

0 stars 0 forks source link

Design dalle3_prompt_template.json II #8

Open Peter96K opened 2 weeks ago

Peter96K commented 2 weeks ago

Improve Prompt Structure for Image Generation with Grammar Instructions

Skewed results identified for prompts containing grammar instructions. Resarch OpenAI's knowledge base and support forum threads and propose a solution that will enhance image quality generation. To do this, refactor the ./backend/image_generator.py module, in particular the craft_promp() function.

Resources :

DALL-E3 playground :

https://dalle3-playground.pages.dev/

Note : paste API key manually in order to generate requests; Use kedtechLa playground to analyze and compare different prompt settings and ideal formats (gpt4-o1, etc.)

Suggestion : it may sense to include prompts into the target data structure and subsequently tune models to generate image prompts too.

Output :

dalle3_prompt_template.json

Peter96K commented 2 weeks ago

This is a joint task that will include collaboration with @Peter96K .

Peter96K commented 2 weeks ago

Idea : vetting concepts first via google images to get a feel for a general data representation frequency

Peter96K commented 2 weeks ago
M-Maker25 commented 2 weeks ago

Just a few notes:

The meaning of "Styles" in image generation-

  1. Artistic styles: terms like "watercolour", "oil painting", "sketch", or "digital art"
  2. Photography styles: terms like "HDR", "macro photography", or "black and white"
  3. Time period or genre styles: terms like "retro", "cyberpunk", "Baroque", or "futuristic"
  4. Rendering styles: terms like "3D render", "low-poly", or "isometric"

Explore which styles work best with the model. Make note of most popular styles and how they affect output.

Key questions:

  1. Purpose of the template?
  2. How to define key elements of a prompt?
  3. Design structure of the JSON Template?
  4. Way to create simple example prompt template? "subject", "style", "environment", "details", "lighting", "quality"
  5. Customize "mood: describe the overall feeling (e.g. cheerful, eerie)", "camera angle: specify viewpoints e.g. "from above", or "wide shot"
M-Maker25 commented 2 weeks ago

Had short discussion today with Sam about styles and which ones seem to be working better so far with TF.

M-Maker25 commented 1 week ago

Collection of useful styles in DALL.E 3:

https://www.dallestyle.com/2024/05/dalle-3s-masterpieces-50-styles-you.html

Peter96K commented 1 week ago

[like] Peter Kramár, kedtech reacted to your message:


From: M-Maker25 @.> Sent: Tuesday, October 1, 2024 3:35:38 PM To: kramars-realspeak/fm-gai-lottie-true-false-v1 @.> Cc: Peter Kramár, kedtech @.>; Assign @.> Subject: Re: [kramars-realspeak/fm-gai-lottie-true-false-v1] Design dalle3_prompt_template.json (Issue kramars-realspeak/fm-gai-lottie-multiple-choice-v1#8)

Collection of useful styles in DALL.E 3:

https://www.dallestyle.com/2024/05/dalle-3s-masterpieces-50-styles-you.html

— Reply to this email directly, view it on GitHubhttps://github.com/kramars-realspeak/fm-gai-lottie-multiple-choice-v1/issues/8, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AWVGRWZAV4CVN7WHHPBM4ZTZZK6MVAVCNFSM6AAAAABOUWHRNOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGOBWGM2DMOJZGE. You are receiving this because you were assigned.Message ID: @.***>

M-Maker25 commented 1 week ago

A few notes about the styles used so far for:

3-yle_dataset1

  1. photorealistic
  2. comic book
  3. illustrative
  4. colorful cartoon
  5. playful cartoon

5-int_dataset1

  1. cyberpunk, futuristic and very vivid
  2. informative infographic
  3. vibrant infographic
  4. colorful educational illustration
  5. colorful seafood illustration
  6. colorful fruit and seafood illustration
  7. colorful food illustration
  8. professional infographic
  9. clean professional graphic
  10. realistic illustration
  11. mystical illustration
  12. mystical themed illustration
  13. photorealistic
  14. vibrant illustration
  15. detailed illustration
  16. nature-themed illustration
  17. playful illustration
Peter96K commented 6 days ago

The next step now seems to be thinking about an appropriate datastructure to wrap these prompts into that can then be referenced by all of our colleagues so that they can improve their image generation during analysis.

M-Maker25 commented 5 days ago

Consider adding the "age_group" to the template (to indicate the intended age group for context).

Peter96K commented 5 days ago

This sounds like quite a necessary feature. Perhaps in the future it would be interesting to use ms-interest-token-v1 data to understand user needs / visual preferences and understand more parameters such as these.

Peter96K commented 4 days ago

Consider adding the "age_group" to the template (to indicate the intended age group for context).

dalle3, text, complexity

M-Maker25 commented 4 days ago

Need to consider the point raised by Sam during today's Sprint about "text in images". Is there a way to make sure that the text in images will be correct? What can be done about dealing with this regarding DALL.E 3 generated images?

M-Maker25 commented 4 days ago

Not sure how useful this is for our purposes, but one solution is to convert the image into Adobe Acrobat Pro and edit it using OCR (Optical Character Recognition) on the text content. This will allow manual edits to be made to correct the text.

M-Maker25 commented 4 days ago

Findings related to images generated in DALL.E 3 Playground:

Without specifying any type of head coverings in the prompt, some images of girls have been with head coverings.

e.g. Prompt: "Create a realistic style image of a beach, with a young woman standing and peering out towards the ocean, where there is a faint glow of dawn visible on the horizon. The sky is tinged with gold." Head covered girl (1)

Prompt: "Create an image of four children in a classroom. One child has green eyes and brown hair. Another child has blue eyes and blonde hair." Head Covered girl (2)

M-Maker25 commented 3 days ago

Findings:

The "lego" style images work really well with YLE (e.g. G_BUSTERS WK 6 T/F). Some faces on the images were a little distorted, but images were still useable.

M-Maker25 commented 2 days ago

A summary of useful styles for different age groups:

Need to consider how the visuals will match both the cognitive level and the interests of the students.