lllyasviel / Fooocus

Focus on prompting and generating
GNU General Public License v3.0
41.02k stars 5.76k forks source link

[Bug]: Negative-only styles: first line from prompt ignored sometimes #3367

Closed IPv6 closed 3 months ago

IPv6 commented 3 months ago

Checklist

What happened?

Some Fooocus styles can "remove" first line of multi-line prompt when used exclusively. For example "Fooocus Semi Realistic". If user disable all styles except this one - first line from user prompt will be completely ignored in rendering, it will not be encoded etc. If user use some other style in addition to it (like 'Fooocus V2'0 - everything back to normal, first line of prompt starting affect the generation again. This can be seen in Raw prompt of the log.

The reason: there is no "positive" prompt key, only negative once. positive part of such style defaults to empty string. So in "apply_style" method there is no "{prompt}" fragment anywhere - and user prompt essentially ignored with single negative-only style enabled.

Steps to reproduce the problem

  1. write multi-line prompt
  2. disable all styles. enable "Fooocus Semi Realistic" ONLY
  3. generate image, observe raw prompt in logs. there is no first line and changing first line will not affect generation (even with fixed seed)

What should have happened?

First line should be used for generation in any case

What browsers do you use to access Fooocus?

No response

Where are you running Fooocus?

None

What operating system are you using?

No response

Console logs

Prompt:
lion,
moon

Encoded text:

[Fooocus] Initializing ...
[Fooocus] Loading models ...
Refiner unloaded.
[Fooocus] Processing prompts ...
[Prompt] clip_encode positive: ['moon']
[Fooocus] Encoding positive #1 ...
[Prompt] clip_encode positive: ['moon']
[Fooocus] Encoding positive #2 ...
[Fooocus] Encoding negative #1 ...
[Prompt] clip_encode negative: ['(worst quality, low quality, normal quality, lowres, low details, oversaturated, undersaturated, overexposed, underexposed, bad photo, bad photography, bad art:1.4), (watermark, signature, text font, username, error, logo, words, letters, digits, autograph, trademark, name:1.2), (blur, blurry, grainy), morbid, ugly, asymmetrical, mutated malformed, mutilated, poorly lit, bad shadow, draft, cropped, out of frame, cut off, censored, jpeg artifacts, out of focus, glitch, duplicate, (bad hands, bad anatomy, bad body, bad face, bad teeth, bad arms, bad legs, deformities:1.3)']
[Fooocus] Encoding negative #2 ...
[Prompt] clip_encode negative: ['(worst quality, low quality, normal quality, lowres, low details, oversaturated, undersaturated, overexposed, underexposed, bad photo, bad photography, bad art:1.4), (watermark, signature, text font, username, error, logo, words, letters, digits, autograph, trademark, name:1.2), (blur, blurry, grainy), morbid, ugly, asymmetrical, mutated malformed, mutilated, poorly lit, bad shadow, draft, cropped, out of frame, cut off, censored, jpeg artifacts, out of focus, glitch, duplicate, (bad hands, bad anatomy, bad body, bad face, bad teeth, bad arms, bad legs, deformities:1.3)']
[Fooocus] Image processing ...


### Additional information

_No response_
mashb1t commented 3 months ago

Oh really? Will check this out, thank you for reporting!

mashb1t commented 3 months ago

2 solutions, open for suggestions:

  1. replace this code

    def apply_style(style, positive):
      p, n = styles[style]
      return p.replace('{prompt}', positive).splitlines(), n.splitlines()

    with

    def apply_style(style, positive):
    p, n = styles[style]
    p = p.replace('{prompt}', positive) if '{prompt}' in p else positive
    return p.splitlines(), n.splitlines()

    => adding the positive prompt if style doesn't contain the placeholder {prompt}, potentially multiplying the subject

  2. adding the positive prompt to the beginning of the positive_basic_workloads when there is no placeholder {prompt} in any of the styles in this code, probably resulting in less breaking code for users trying to reproduce existing images

  placeholder_replaced = False

  for j, s in enumerate(task_styles):
      if s == random_style_name:
          s = get_random_style(task_rng)
          task_styles[j] = s
      p, n, style_has_placeholder = apply_style(s, positive=task_prompt)
      if style_has_placeholder:
          placeholder_replaced = True
      positive_basic_workloads = positive_basic_workloads + p
      negative_basic_workloads = negative_basic_workloads + n

  if not placeholder_replaced:
      positive_basic_workloads = [task_prompt] + positive_basic_workloads

I heavily lean towards option 2, but let me know your opinion. Thanks!

IPv6 commented 3 months ago

Thanks, with this patch it`s fixed!

Minor suggection though: now there is the opposite edge case. If user enables a lot of several styles (not sure this is frequent, but just in case) then user will end up with duplicated first line in prompt. Which affects the weighting of clip guidance afaik without user knowing it. but at least nothing lost from prompt

Possible solution is to implement (1) or (2) with additional duplications removing down the line - in remove_empty_str for example

def remove_empty_str(items, default=None):
    items_tmp = [x for x in items if x != ""]

    items = []
    [items.append(x) for x in items_tmp if x not in items]

    if len(items) == 0 and default is not None:
        return [default]
    return items

remove_empty_str can be renamed into str_list_normalization. Looking into code is that what it means to do, "normalize" the list of prompt strings

mashb1t commented 3 months ago

This is why i'd propose to implement 2., but then with removal of empty strings (array items, if appended by using multiple negative only styles).

I would prevent forceful deduplication under any circumstance to allow users to reproduce already created images as well as to keep the resulting prompt in metadata comprehensible.

IPv6 commented 3 months ago

i understand, thanks for quick fix! Updated locally, will wait for Fooocus release then 👍

mashb1t commented 3 months ago

Merged to main, see https://github.com/lllyasviel/Fooocus/commit/1be3c504ed0b15662131a9e16573e5e2995620bd / https://github.com/lllyasviel/Fooocus/pull/3372