Add the Interface of CLIPEmbedder which aligns the API in PKU Open Sora Plan
Reused the existing modules in mindone.transformers.clip and added CLIPVisionEmbeddings, CLIPVisionTransformer, CLIPModel with reference to Open PR -> #436
For T5 text encoder support, please refer to #440
Before submitting
[ ] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
[ ] Did you make sure to update the documentation with your changes? E.g. record bug fixes or new features in What's New. Here are the
documentation guidelines
[ ] Did you build and run the code without any errors?
[ ] Did you report the running environment (NPU type/MS version) and performance in the doc? (better record it for data loading, model inference, or training tasks)
[ ] Did you write any new necessary tests?
Who can review?
Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.
What does this PR do?
Fixes # (issue)
Adds # (feature)
CLIPVisionEmbeddings
,CLIPVisionTransformer
,CLIPModel
with reference to Open PR -> #436Before submitting
What's New
. Here are the documentation guidelinesWho can review?
Anyone in the community is free to review the PR once the tests have passed. Feel free to tag members/contributors who may be interested in your PR.
@SamitHuang @geniuspatrick