Closed LifuWang-66 closed 4 months ago
we change the tokenizer output to text encoder output at https://github.com/tianweiy/DMD2/blob/0f8a481716539af7b2795740c9763a7d0d05b83b/main/sd_unified_model.py#L176
Thanks for your swift response!
I think it is only specific to sdxl, but other models are still using tokenizer output. Maybe it will be better to use text encoder output for other models as well?
for sd, it is also using text embedding https://github.com/tianweiy/DMD2/blob/0f8a481716539af7b2795740c9763a7d0d05b83b/main/sd_unified_model.py#L222
I mean there is no way for the unet to take any other conditioning, right ?
I found that the input to unet is output from CLIP tokenizer, but in both sd and sdxl pipelines the inputs are output from CLIP text encoder. Is there a specific reason for the choice?