Open runzeer opened 2 weeks ago
Could @Sze-qq have a look on this?
Sure. Here are the parameters we used to generate the video:
Resolution: 720p Sampling Steps: 100 Aesthetic Score: 7 Refine Prompt with GPT-4: Unchecked Other settings: Default Prompt: A Japanese tram glides through the snowy streets of a city, its sleek design cutting through the falling snowflakes with grace. The tram's illuminated windows cast a warm glow onto the snowy surroundings, creating a cozy atmosphere inside. Snowflakes dance in the air, swirling around the tram as it moves along its tracks. Outside, the city is blanketed in a layer of snow, transforming familiar streets into a winter wonderland. Cherry blossom trees, now bare, stand quietly along the tram tracks, their branches dusted with snow. People hurry along the sidewalks, bundled up against the cold, while the tram's bell rings softly, announcing its arrival at each stop.
Here is my inference configuration. I did the modification according to what you said. [2024-06-21 02:58:22] Inference configuration: {'aes': 7.0, 'align': 5, 'aspect_ratio': '9:16', 'batch_size': 1, 'condition_frame_length': 5, 'config': 'configs/opensora-v1-2/inference/sample.py', 'dtype': 'bf16', 'flow': None, 'fps': 24, 'frame_interval': 1, 'model': {'enable_flash_attn': True, 'enable_layernorm_kernel': True, 'from_pretrained': 'hpcai-tech/OpenSora-STDiT-v3', 'qk_norm': True, 'type': 'STDiT3-XL/2'}, 'multi_resolution': 'STDiT2', 'num_frames': '102', 'prompt': ['A Japanese tram glides through the snowy streets of a city, its ' 'sleek design cutting through the falling snowflakes with grace. ' "The tram's illuminated windows cast a warm glow onto the snowy " 'surroundings, creating a cozy atmosphere inside. Snowflakes dance ' 'in the air, swirling around the tram as it moves along its ' 'tracks. Outside, the city is blanketed in a layer of snow, ' 'transforming familiar streets into a winter wonderland. Cherry ' 'blossom trees, now bare, stand quietly along the tram tracks, ' 'their branches dusted with snow. People hurry along the ' "sidewalks, bundled up against the cold, while the tram's bell " 'rings softly, announcing its arrival at each stop.'], 'prompt_as_path': False, 'resolution': '720p', 'save_dir': './samples/debug', 'save_fps': 24, 'scheduler': {'cfg_scale': 7.0, 'num_sampling_steps': 100, 'type': 'rflow', 'use_timestep_transform': True}, 'seed': 1024, 'text_encoder': {'from_pretrained': 'DeepFloyd/t5-v1_1-xxl', 'model_max_length': 300, 'type': 't5'}, 'vae': {'from_pretrained': 'hpcai-tech/OpenSora-VAE-v1.2', 'micro_batch_size': 4, 'micro_frame_size': 17, 'type': 'OpenSoraVAE_V1_2'}}
But the generated video is still blurry like below. Can you help me check whether there still exist incorrect parameters? Thanks a lot!
https://github.com/hpcaitech/Open-Sora/assets/14292495/baae4b5e-75d7-4492-984b-233720848b47
@Sze-qq Could you help me with the issues above? Thanks a lot!
Hi @runzeer . I generated the image first and then clicked "Generate Video" on Gradio. Hope this helps.
@Sze-qq. Thanks a lot for your explanation. But I found the generated image style, like the image below, is different with the gallery videos. So You used the hpcai-tech/OpenSora-STDiT-v3 checkpoint or the original Pixart-sigma checkpoint for the first text-to-image task? Or I should add some prompts for the generated style?
We use the hpcai-tech/OpenSora-STDiT-v3 checkpoint. I tested that I can also reproduce @Sze-qq 's result.
@zhengzangw Thanks a lot. Did you share your seed for generating @Sze-qq 's result? I want to reproduce the video completely.
Hi @runzeer . I generated the image first and then clicked "Generate Video" on Gradio. Hope this helps.
Thanks for your reply, but in the galary page, it denotes " text to video"
This issue is stale because it has been open for 7 days with no activity.
the generated video quality different with the samples shown in gallery. Could your share the relevant inference parameters for the gallery sample results?
https://github.com/hpcaitech/Open-Sora/assets/14292495/3b48b0e5-f175-4de2-98f3-04b96414eb65