python api code for text to video generation?

SutirthaChakraborty commented 2 months ago

How long videos can we generate? what would be the best model combinations ?

SamitHuang commented 2 months ago

Hi @SutirthaChakraborty with opensora_hpcai, we support generating 720P videos of 16 seconds (408 frames).

For model combination, do you mean combining text-to-image model with image-to-video model? If so, I would suggest using SD3 or Flux.1 (in PR) for T2I generation followed by DynamiCrafter for I2V generation for the best visual quality. If you prefer long videos, you may use opensora_hpcai to do I2V.

Thanks for your attention to our aigc kit.

SutirthaChakraborty commented 2 months ago

Hi @SamitHuang Thanks for your detailed reply. Is there any code reference for using SD3 + DynamiCrafter pre-trained weights and saving the mp4? Just like you provided for image generation in the readme file.

SamitHuang commented 2 months ago

Sure.

SD3 T2I based on mindone.diffusers. Here is an example.


>>> import mindspore
>>> from mindone.diffusers import StableDiffusion3Pipeline

pipe = StableDiffusion3Pipeline.from_pretrained( ... "stabilityai/stable-diffusion-3-medium-diffusers", ... mindspore_dtype=mindspore.float16, ... ) prompt = "A cat holding a sign that says hello world" image = pipe(prompt)[0][0] image.save("sd3.png")

DynamiCrafter for I2V following the instructions in examples/dynamicrafter

SutirthaChakraborty commented 2 months ago

There is no direct way to run the dynamicrafter?

SamitHuang commented 2 months ago

There is no direct way to run the dynamicrafter?

Sorry that it can only run with script/inference.py currently. @HaoyangLee may consider wrapping the inference script into a class or function that is easier to integrate into a high-level pipeline.

SamitHuang commented 2 months ago

There is no direct way to run the dynamicrafter?

Sorry that it can only run with script/inference.py currently. @HaoyangLee may consider wrapping the inference script into a class or function that is easier to integrate into a high-level pipeline.

mindspore-lab / mindone

python api code for text to video generation? #633