Design dalle3_prompt_template.json II

Peter96K commented 2 weeks ago

Improve Prompt Structure for Image Generation with Grammar Instructions

Skewed results identified for prompts containing grammar instructions. Resarch OpenAI's knowledge base and support forum threads and propose a solution that will enhance image quality generation. To do this, refactor the ./backend/image_generator.py module, in particular the craft_promp() function.

Resources :

DALL-E3 playground :

https://dalle3-playground.pages.dev/

Note : paste API key manually in order to generate requests; Use kedtechLa playground to analyze and compare different prompt settings and ideal formats (gpt4-o1, etc.)

Suggestion : it may sense to include prompts into the target data structure and subsequently tune models to generate image prompts too.

Output :

dalle3_prompt_template.json

Peter96K commented 2 weeks ago

This is a joint task that will include collaboration with @Peter96K .

Peter96K commented 2 weeks ago

Idea : vetting concepts first via google images to get a feel for a general data representation frequency

Peter96K commented 2 weeks ago

TF - adding a time limit for decisions for activites ; games, etc. focusing on interactive elements / decision trees, T-F decisions, clicking on specific objects; consider writing a script for analyzing images and identifying possible ways how that image can be used in teaching; adapting activities accordingly to match images (by modifying the activity object) students making their own images and setting prompts activity2image image2activity how much do kedtechLa need to adapt activities, how many activity objects

M-Maker25 commented 2 weeks ago

Just a few notes:

The meaning of "Styles" in image generation-

Artistic styles: terms like "watercolour", "oil painting", "sketch", or "digital art"
Photography styles: terms like "HDR", "macro photography", or "black and white"
Time period or genre styles: terms like "retro", "cyberpunk", "Baroque", or "futuristic"
Rendering styles: terms like "3D render", "low-poly", or "isometric"

Explore which styles work best with the model. Make note of most popular styles and how they affect output.

Key questions:

Purpose of the template?
How to define key elements of a prompt?
Design structure of the JSON Template?
Way to create simple example prompt template? "subject", "style", "environment", "details", "lighting", "quality"
Customize "mood: describe the overall feeling (e.g. cheerful, eerie)", "camera angle: specify viewpoints e.g. "from above", or "wide shot"

M-Maker25 commented 2 weeks ago

Had short discussion today with Sam about styles and which ones seem to be working better so far with TF.

M-Maker25 commented 1 week ago

Collection of useful styles in DALL.E 3:

https://www.dallestyle.com/2024/05/dalle-3s-masterpieces-50-styles-you.html

Peter96K commented 1 week ago

[like] Peter Kramár, kedtech reacted to your message:

From: M-Maker25 @.> Sent: Tuesday, October 1, 2024 3:35:38 PM To: kramars-realspeak/fm-gai-lottie-true-false-v1 @.> Cc: Peter Kramár, kedtech @.>; Assign @.> Subject: Re: [kramars-realspeak/fm-gai-lottie-true-false-v1] Design dalle3_prompt_template.json (Issue kramars-realspeak/fm-gai-lottie-multiple-choice-v1#8)

Collection of useful styles in DALL.E 3:

https://www.dallestyle.com/2024/05/dalle-3s-masterpieces-50-styles-you.html

— Reply to this email directly, view it on GitHubhttps://github.com/kramars-realspeak/fm-gai-lottie-multiple-choice-v1/issues/8, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AWVGRWZAV4CVN7WHHPBM4ZTZZK6MVAVCNFSM6AAAAABOUWHRNOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGOBWGM2DMOJZGE. You are receiving this because you were assigned.Message ID: @.***>

M-Maker25 commented 1 week ago

A few notes about the styles used so far for:

3-yle_dataset1

photorealistic
comic book
illustrative
colorful cartoon
playful cartoon

5-int_dataset1

cyberpunk, futuristic and very vivid
informative infographic
vibrant infographic
colorful educational illustration
colorful seafood illustration
colorful fruit and seafood illustration
colorful food illustration
professional infographic
clean professional graphic
realistic illustration
mystical illustration
mystical themed illustration
photorealistic
vibrant illustration
detailed illustration
nature-themed illustration
playful illustration

Peter96K commented 6 days ago

The next step now seems to be thinking about an appropriate datastructure to wrap these prompts into that can then be referenced by all of our colleagues so that they can improve their image generation during analysis.

M-Maker25 commented 5 days ago

Consider adding the "age_group" to the template (to indicate the intended age group for context).

Peter96K commented 5 days ago

This sounds like quite a necessary feature. Perhaps in the future it would be interesting to use ms-interest-token-v1 data to understand user needs / visual preferences and understand more parameters such as these.

Peter96K commented 4 days ago

Consider adding the "age_group" to the template (to indicate the intended age group for context).

dalle3, text, complexity

M-Maker25 commented 4 days ago

Need to consider the point raised by Sam during today's Sprint about "text in images". Is there a way to make sure that the text in images will be correct? What can be done about dealing with this regarding DALL.E 3 generated images?

M-Maker25 commented 4 days ago

Not sure how useful this is for our purposes, but one solution is to convert the image into Adobe Acrobat Pro and edit it using OCR (Optical Character Recognition) on the text content. This will allow manual edits to be made to correct the text.

M-Maker25 commented 4 days ago

Findings related to images generated in DALL.E 3 Playground:

Without specifying any type of head coverings in the prompt, some images of girls have been with head coverings.

e.g. Prompt: "Create a realistic style image of a beach, with a young woman standing and peering out towards the ocean, where there is a faint glow of dawn visible on the horizon. The sky is tinged with gold." Head covered girl (1)

Prompt: "Create an image of four children in a classroom. One child has green eyes and brown hair. Another child has blue eyes and blonde hair." Head Covered girl (2)

M-Maker25 commented 3 days ago

Findings:

The "lego" style images work really well with YLE (e.g. G_BUSTERS WK 6 T/F). Some faces on the images were a little distorted, but images were still useable.

M-Maker25 commented 2 days ago

A summary of useful styles for different age groups:

Ages 3-6: Cartoonish, bold, simple illustrations
Ages 6-9: Whimsical, soft, imaginative illustrations
Ages 9-12: Semi-realistic with playful details
Ages 12-14: Realistic with dynamic poses or abstract elements
Ages 14-18: Realism, surrealism, and conceptual art for deeper thinking

Need to consider how the visuals will match both the cognitive level and the interests of the students.

kramars-realspeak / fm-gai-lottie-multiple-choice-v1

Design dalle3_prompt_template.json II #8