Open Peter96K opened 2 weeks ago
This is a joint task that will include collaboration with @Peter96K .
Idea : vetting concepts first via google images to get a feel for a general data representation frequency
Just a few notes:
The meaning of "Styles" in image generation-
Explore which styles work best with the model. Make note of most popular styles and how they affect output.
Key questions:
Had short discussion today with Sam about styles and which ones seem to be working better so far with TF.
Collection of useful styles in DALL.E 3:
https://www.dallestyle.com/2024/05/dalle-3s-masterpieces-50-styles-you.html
[like] Peter Kramár, kedtech reacted to your message:
From: M-Maker25 @.> Sent: Tuesday, October 1, 2024 3:35:38 PM To: kramars-realspeak/fm-gai-lottie-true-false-v1 @.> Cc: Peter Kramár, kedtech @.>; Assign @.> Subject: Re: [kramars-realspeak/fm-gai-lottie-true-false-v1] Design dalle3_prompt_template.json (Issue kramars-realspeak/fm-gai-lottie-multiple-choice-v1#8)
Collection of useful styles in DALL.E 3:
https://www.dallestyle.com/2024/05/dalle-3s-masterpieces-50-styles-you.html
— Reply to this email directly, view it on GitHubhttps://github.com/kramars-realspeak/fm-gai-lottie-multiple-choice-v1/issues/8, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AWVGRWZAV4CVN7WHHPBM4ZTZZK6MVAVCNFSM6AAAAABOUWHRNOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGOBWGM2DMOJZGE. You are receiving this because you were assigned.Message ID: @.***>
A few notes about the styles used so far for:
3-yle_dataset1
5-int_dataset1
The next step now seems to be thinking about an appropriate datastructure to wrap these prompts into that can then be referenced by all of our colleagues so that they can improve their image generation during analysis.
Consider adding the "age_group" to the template (to indicate the intended age group for context).
This sounds like quite a necessary feature. Perhaps in the future it would be interesting to use ms-interest-token-v1 data to understand user needs / visual preferences and understand more parameters such as these.
Consider adding the "age_group" to the template (to indicate the intended age group for context).
dalle3, text, complexity
Need to consider the point raised by Sam during today's Sprint about "text in images". Is there a way to make sure that the text in images will be correct? What can be done about dealing with this regarding DALL.E 3 generated images?
Not sure how useful this is for our purposes, but one solution is to convert the image into Adobe Acrobat Pro and edit it using OCR (Optical Character Recognition) on the text content. This will allow manual edits to be made to correct the text.
Findings related to images generated in DALL.E 3 Playground:
Without specifying any type of head coverings in the prompt, some images of girls have been with head coverings.
e.g. Prompt: "Create a realistic style image of a beach, with a young woman standing and peering out towards the ocean, where there is a faint glow of dawn visible on the horizon. The sky is tinged with gold."
Prompt: "Create an image of four children in a classroom. One child has green eyes and brown hair. Another child has blue eyes and blonde hair."
Findings:
The "lego" style images work really well with YLE (e.g. G_BUSTERS WK 6 T/F). Some faces on the images were a little distorted, but images were still useable.
A summary of useful styles for different age groups:
Need to consider how the visuals will match both the cognitive level and the interests of the students.
Improve Prompt Structure for Image Generation with Grammar Instructions
Skewed results identified for prompts containing grammar instructions. Resarch OpenAI's knowledge base and support forum threads and propose a solution that will enhance image quality generation. To do this, refactor the ./backend/image_generator.py module, in particular the craft_promp() function.
Resources :
DALL-E3 playground :
https://dalle3-playground.pages.dev/
Note : paste API key manually in order to generate requests; Use kedtechLa playground to analyze and compare different prompt settings and ideal formats (gpt4-o1, etc.)
Suggestion : it may sense to include prompts into the target data structure and subsequently tune models to generate image prompts too.
Output :
dalle3_prompt_template.json