huchenlei / ComfyUI_omost

ComfyUI implementation of Omost
Apache License 2.0
338 stars 21 forks source link

Is ConditioningSetMask used right for attention mask? #12

Closed wuutiing closed 4 weeks ago

wuutiing commented 4 weeks ago

Thank you a lot for your work implementing Omost's ComfyUI nodes. but i think ConditioningSetMask set a mask for condition, the mask take effect in condition combining stage, not cross-attention stage, which differs with Omost, what do you think about it, looking forward your reply. Thank you much. @huchenlei @lllyasviel

        for o in range(batch_chunks):
            cond_index = cond_or_uncond[o]
            out_conds[cond_index][:,:,area[o][2]:area[o][0] + area[o][2],area[o][3]:area[o][1] + area[o][3]] += output[o] * mult[o] # mutl is masks
            out_counts[cond_index][:,:,area[o][2]:area[o][0] + area[o][2],area[o][3]:area[o][1] + area[o][3]] += mult[o]

code from https://github.com/comfyanonymous/ComfyUI/blob/b1fd26fe9e55163f780bf9e5f56bf9bf5f035c93/comfy/samplers.py#L220C9-L224C1

and Omost's impl

        mask_bool = masks > 0.5
        ....
        sim = query @ key.transpose(-2, -1) * attn.scale
        sim = sim * mask_scale.to(sim)
        sim.masked_fill_(mask_bool.logical_not(), float("-inf"))

code from https://github.com/lllyasviel/Omost/blob/731e74922fc6be91171688574d07624f93d3b658/lib_omost/pipeline.py#L165C9-L165C65

huchenlei commented 4 weeks ago

I think I have made it clear in README that the ComfyUI's area cond differs from original omost's impl.

Region condition part converts the JSON condition to ComfyUI's area format. Under the hood, it is calling ConditioningSetMask node to set non-overlap area for each cond. According to https://github.com/lllyasviel/Omost#regional-prompter, original Omost repo is using method 3, while ComfyUI's built-in method is method 2. So expect there to be some difference on results. I will implement ComfyUI version of densediffusion soon.