mindspore-lab / mindone

one for all, Optimal generator with No Exception
Apache License 2.0
330 stars 63 forks source link

SDXL: transpose `text_projection` in converting #399

Closed townwish4git closed 3 months ago

townwish4git commented 3 months ago

What does this PR do?

Parameter conditioner.embedders.1.model.text_projection is called differently in MindONE and Diffusers, so it is transposed when pre-trained weight is converted

DETAIL:

diffusers: pooled_text_embedding = self.text_projection(input), where self.text_projection is a torch.nn.Linear without bias, which means pooled_text_embedding = input @ self.text_projection.weight.T

MindONE (and stability-AI): pooled_text_embedding = input @ self.text_projection.weight, where no transpose op.

so we transpose this parameter manually when convert weights.

Before submitting

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag members/contributors who may be interested in your PR.

@xxx

townwish4git commented 3 months ago

diffusers: pooled_text_embedding = self.text_projection(input), where self.text_projection is a torch.nn.Linear without bias, which means pooled_text_embedding = input @ self.text_projection.weight.T

MindONE (and stability-AI): pooled_text_embedding = input @ self.text_projection.weight, where no transpose op.

so we transpose this parameter manually when convert weights.