What does this PR do?

Parameter conditioner.embedders.1.model.text_projection is called differently in MindONE and Diffusers, so it is transposed when pre-trained weight is converted

DETAIL:

diffusers: pooled_text_embedding = self.text_projection(input), where self.text_projection is a torch.nn.Linear without bias, which means pooled_text_embedding = input @ self.text_projection.weight.T

MindONE (and stability-AI): pooled_text_embedding = input @ self.text_projection.weight, where no transpose op.

so we transpose this parameter manually when convert weights.

Before submitting

[x] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
[x] Did you read the contributor guideline?
[x] Did you make sure to update the documentation with your changes? E.g. record bug fixes or new features in What's New. Here are the documentation guidelines
[x] Did you build and run the code without any errors?
[x] Did you report the running environment (NPU type/MS version) and performance in the doc? (better record it for data loading, model inference, or training tasks)
[ ] Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag members/contributors who may be interested in your PR.

@xxx

mindspore-lab / mindone

SDXL: transpose `text_projection` in converting #399

What does this PR do?

Before submitting

Who can review?