What does this PR do?

Fixes # (issue)

Adds # (feature)

Add the Interface of CLIPEmbedder which aligns the API in PKU Open Sora Plan
Reused the existing modules in mindone.transformers.clip and added CLIPVisionEmbeddings, CLIPVisionTransformer, CLIPModel with reference to Open PR -> #436
For T5 text encoder support, please refer to #440

Before submitting

[ ] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
[ ] Did you read the contributor guideline?
[ ] Did you make sure to update the documentation with your changes? E.g. record bug fixes or new features in What's New. Here are the documentation guidelines
[ ] Did you build and run the code without any errors?
[ ] Did you report the running environment (NPU type/MS version) and performance in the doc? (better record it for data loading, model inference, or training tasks)
[ ] Did you write any new necessary tests?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag members/contributors who may be interested in your PR.

@SamitHuang @geniuspatrick