Closed zjysteven closed 8 months ago
Hi, this argument affects the value of last_idx
, which will accordingly affect the crosss-attention to be normalized (see code below). In fact, in this repository, we did not take into account the cross-attention of token of 'eot'. However, in other versions of SD, you can choose to include it or not, depending on which one yields better results.
attention_for_text = attention_maps[:, :, 1:last_idx]
I see. Another quick question, do you recommend or not using iterative refinement? I see in the code comment it says "not necessary", but one of the example command in README uses refinement. I would imagine that it will slow the generation. How much image quality improvement can it bring?
I haven't conducted quantitative experiments regarding the refinement, but based on my experience, refinement can make the image more realistic in some cases.
I see. Thank you for sharing the experience!
Hi,
I'm recently working on adapting BoxDiff into the latest diffusers library, including the integration for both SD and SDXL. I came across this argument
normalize_eot
here: https://github.com/showlab/BoxDiff/blob/9e90000921be244468bcba4779e3c8b2c4dfb086/pipeline/sd_pipeline_boxdiff.py#L194-L198It is set to
True
for SD2.1 andFalse
for SD1.5. I'm not super familiar with the details of different versions, so would you mind clarifying what is the purpose of this argument? Thank you in advance.