hpcaitech / Open-Sora

Open-Sora: Democratizing Efficient Video Production for All
https://hpcaitech.github.io/Open-Sora/
Apache License 2.0
21.76k stars 2.11k forks source link

请问,v1.1推理分辨率,为什么智能用240,426,其他推出来的都是噪音? #434

Closed FDInSky closed 3 months ago

TYang92677626 commented 3 months ago

I encounter the same problem. When I use this inference command: python scripts/inference.py configs/opensora-v1-1/inference/sample.py --prompt "A beautiful sunset over the city" --num-frames 32 --image-size 480 854 The video I got is: 1718097184945

JThh commented 3 months ago

It is presumably due to the fact that we train only using such sizes, i.e. it cannot generalize to other sizes for now.

TYang92677626 commented 3 months ago

(1)Today I retested it, running the command: python scripts/inference.py configs/opensora-v1-1/inference/sample.py --prompt "A beautiful sunset over the city" --num-frames 32 --image-size 480 854 The video I got is: 1718260517779

(2)However, when I tested this command: python scripts/inference-long.py configs/opensora-v1-1/inference/sample.py --num-frames 64 --image-size 480 848 --sample-name image-cond --prompt 'A car driving on the ocean.' The video I got was a pretty bad scene: 1718260736878

(3)What I don't understand is that in the Open-Sora v1.1 report sample script, there is a {"reference_path": "https://cdn.openai.com/tmp/s/interp/d0.mp4", "mask_strategy": "0,0,0,0,16"}, what does this mean, is it the source video that needs to be referenced when generating the video? I executed this command and got: python scripts/inference-long.py configs/opensora-v1-1/inference/sample.py --num-frames 32 --image-size 240 426 --loop 16 --condition-frame-length 8 --sample-name long --prompt '|0|a white jeep equipped with a roof rack driving on a dirt road in a coniferous forest.|2|a white jeep equipped with a roof rack driving on a dirt road in the desert.|4|a white jeep equipped with a roof rack driving on a dirt road in a mountain.|6|A white jeep equipped with a roof rack driving on a dirt road in a city.|8|a white jeep equipped with a roof rack driving on a dirt road on the surface of a river.|10|a white jeep equipped with a roof rack driving on a dirt road under the lake.|12|a white jeep equipped with a roof rack flying into the sky.|14|a white jeep equipped with a roof rack driving in the universe. Earth is the background.{"reference_path": "https://cdn.openai.com/tmp/s/interp/d0.mp4", "mask_strategy": "0,0,0,0,16"}' The video I got is: 1718260981777

(4)Without the reference_path parameter, I get something like the second video. {"reference_path": "https://cdn.openai.com/tmp/s/interp/d0.mp4", "mask_strategy": "0,0,0,0,16"} What does this mean?

FDInSky commented 3 months ago

@TYang92677626 你好,你第一次和第二次推理,有什么不同吗?

TYang92677626 commented 3 months ago

@TYang92677626 你好,你第一次和第二次推理,有什么不同吗?

Last week I just updated the code, nothing else changed. This week the 1.2 version was released, but I haven't tested it yet.

cd Open-Sora
git pull
pip install -e .
zhengzangw commented 3 months ago

(1)Today I retested it, running the command: python scripts/inference.py configs/opensora-v1-1/inference/sample.py --prompt "A beautiful sunset over the city" --num-frames 32 --image-size 480 854 The video I got is: 1718260517779

(2)However, when I tested this command: python scripts/inference-long.py configs/opensora-v1-1/inference/sample.py --num-frames 64 --image-size 480 848 --sample-name image-cond --prompt 'A car driving on the ocean.' The video I got was a pretty bad scene: 1718260736878

(3)What I don't understand is that in the Open-Sora v1.1 report sample script, there is a {"reference_path": "https://cdn.openai.com/tmp/s/interp/d0.mp4", "mask_strategy": "0,0,0,0,16"}, what does this mean, is it the source video that needs to be referenced when generating the video? I executed this command and got: python scripts/inference-long.py configs/opensora-v1-1/inference/sample.py --num-frames 32 --image-size 240 426 --loop 16 --condition-frame-length 8 --sample-name long --prompt '|0|a white jeep equipped with a roof rack driving on a dirt road in a coniferous forest.|2|a white jeep equipped with a roof rack driving on a dirt road in the desert.|4|a white jeep equipped with a roof rack driving on a dirt road in a mountain.|6|A white jeep equipped with a roof rack driving on a dirt road in a city.|8|a white jeep equipped with a roof rack driving on a dirt road on the surface of a river.|10|a white jeep equipped with a roof rack driving on a dirt road under the lake.|12|a white jeep equipped with a roof rack flying into the sky.|14|a white jeep equipped with a roof rack driving in the universe. Earth is the background.{"reference_path": "https://cdn.openai.com/tmp/s/interp/d0.mp4", "mask_strategy": "0,0,0,0,16"}' The video I got is: 1718260981777

(4)Without the reference_path parameter, I get something like the second video. {"reference_path": "https://cdn.openai.com/tmp/s/interp/d0.mp4", "mask_strategy": "0,0,0,0,16"} What does this mean?

(2) The reason may be that the result is just not good. (3) Please check OpenSora v1.1 report and usage (4) Seems something broken for v1.1. Please try OpenSora 1.2.