Cache text encoder embeds in pipelines

hlky commented 2 days ago

Is your feature request related to a problem? Please describe.

When reusing a prompt text encoder embeds are recomputed, this can be time consuming for something like T5-XXL with offloading or on CPU.

Text encoder embeds are relatively small, so keeping them in memory is feasible.

import torch

clip_l = torch.randn([1, 77, 768])
t5_xxl = torch.randn([1, 512, 4096])
>>> clip_l.numel() * clip_l.dtype.itemsize
236544
>>> t5_xxl.numel() * t5_xxl.dtype.itemsize
8388608

Describe the solution you'd like.

MVP would be reusing the last text encoder embeds if the prompt hasn't changed, this behaviour is supported in community UIs. Ideally, supports multiple prompts, potentially serializable.

yiyixuxu commented 2 days ago

this users can pre-compute prompts themselves and reuse them, no? we can add more doc examples maybe

yiyixuxu commented 2 days ago

would be reusing the last text encoder embeds if the prompt hasn't changed, this behaviour is supported in community UIs

This really should be supported in the UI, not from the diffusers library; our responsibility should be to design our software in a way so that they will be quickly built on top of us (adding features like this in UIs built on top of us)

huggingface / diffusers

Cache text encoder embeds in pipelines #10078