hpcaitech / Open-Sora

Open-Sora: Democratizing Efficient Video Production for All
https://hpcaitech.github.io/Open-Sora/
Apache License 2.0
20.23k stars 1.91k forks source link

the generated video quality different with the samples shown in gallery #504

Open runzeer opened 2 weeks ago

runzeer commented 2 weeks ago

the generated video quality different with the samples shown in gallery. Could your share the relevant inference parameters for the gallery sample results?

https://github.com/hpcaitech/Open-Sora/assets/14292495/3b48b0e5-f175-4de2-98f3-04b96414eb65

zhengzangw commented 2 weeks ago

Could @Sze-qq have a look on this?

Sze-qq commented 2 weeks ago

Sure. Here are the parameters we used to generate the video:

Resolution: 720p Sampling Steps: 100 Aesthetic Score: 7 Refine Prompt with GPT-4: Unchecked Other settings: Default Prompt: A Japanese tram glides through the snowy streets of a city, its sleek design cutting through the falling snowflakes with grace. The tram's illuminated windows cast a warm glow onto the snowy surroundings, creating a cozy atmosphere inside. Snowflakes dance in the air, swirling around the tram as it moves along its tracks. Outside, the city is blanketed in a layer of snow, transforming familiar streets into a winter wonderland. Cherry blossom trees, now bare, stand quietly along the tram tracks, their branches dusted with snow. People hurry along the sidewalks, bundled up against the cold, while the tram's bell rings softly, announcing its arrival at each stop.

runzeer commented 2 weeks ago

Here is my inference configuration. I did the modification according to what you said. [2024-06-21 02:58:22] Inference configuration: {'aes': 7.0, 'align': 5, 'aspect_ratio': '9:16', 'batch_size': 1, 'condition_frame_length': 5, 'config': 'configs/opensora-v1-2/inference/sample.py', 'dtype': 'bf16', 'flow': None, 'fps': 24, 'frame_interval': 1, 'model': {'enable_flash_attn': True, 'enable_layernorm_kernel': True, 'from_pretrained': 'hpcai-tech/OpenSora-STDiT-v3', 'qk_norm': True, 'type': 'STDiT3-XL/2'}, 'multi_resolution': 'STDiT2', 'num_frames': '102', 'prompt': ['A Japanese tram glides through the snowy streets of a city, its ' 'sleek design cutting through the falling snowflakes with grace. ' "The tram's illuminated windows cast a warm glow onto the snowy " 'surroundings, creating a cozy atmosphere inside. Snowflakes dance ' 'in the air, swirling around the tram as it moves along its ' 'tracks. Outside, the city is blanketed in a layer of snow, ' 'transforming familiar streets into a winter wonderland. Cherry ' 'blossom trees, now bare, stand quietly along the tram tracks, ' 'their branches dusted with snow. People hurry along the ' "sidewalks, bundled up against the cold, while the tram's bell " 'rings softly, announcing its arrival at each stop.'], 'prompt_as_path': False, 'resolution': '720p', 'save_dir': './samples/debug', 'save_fps': 24, 'scheduler': {'cfg_scale': 7.0, 'num_sampling_steps': 100, 'type': 'rflow', 'use_timestep_transform': True}, 'seed': 1024, 'text_encoder': {'from_pretrained': 'DeepFloyd/t5-v1_1-xxl', 'model_max_length': 300, 'type': 't5'}, 'vae': {'from_pretrained': 'hpcai-tech/OpenSora-VAE-v1.2', 'micro_batch_size': 4, 'micro_frame_size': 17, 'type': 'OpenSoraVAE_V1_2'}}

But the generated video is still blurry like below. Can you help me check whether there still exist incorrect parameters? Thanks a lot!

https://github.com/hpcaitech/Open-Sora/assets/14292495/baae4b5e-75d7-4492-984b-233720848b47

runzeer commented 2 weeks ago

@Sze-qq Could you help me with the issues above? Thanks a lot!

Sze-qq commented 2 weeks ago

Hi @runzeer . I generated the image first and then clicked "Generate Video" on Gradio. Hope this helps.

runzeer commented 2 weeks ago

@Sze-qq. Thanks a lot for your explanation. But I found the generated image style, like the image below, is different with the gallery videos. So You used the hpcai-tech/OpenSora-STDiT-v3 checkpoint or the original Pixart-sigma checkpoint for the first text-to-image task? Or I should add some prompts for the generated style? 2_480p_cond

zhengzangw commented 1 week ago

We use the hpcai-tech/OpenSora-STDiT-v3 checkpoint. I tested that I can also reproduce @Sze-qq 's result.

runzeer commented 1 week ago

@zhengzangw Thanks a lot. Did you share your seed for generating @Sze-qq 's result? I want to reproduce the video completely.

crystallee-ai commented 1 week ago

Hi @runzeer . I generated the image first and then clicked "Generate Video" on Gradio. Hope this helps.

Thanks for your reply, but in the galary page, it denotes " text to video"

github-actions[bot] commented 8 hours ago

This issue is stale because it has been open for 7 days with no activity.