hpcaitech / Open-Sora

Open-Sora: Democratizing Efficient Video Production for All
https://hpcaitech.github.io/Open-Sora/
Apache License 2.0
20.34k stars 1.92k forks source link

Some questions about the long video generation related command #443

Closed reich208github closed 2 weeks ago

reich208github commented 3 weeks ago

Hi, guys

I have some questions about the command below:

# long video generation python scripts/inference-long.py configs/opensora-v1-1/inference/sample.py --ckpt-path CKPT_PATH \ --num-frames 32 --image-size 240 426 --loop 16 --condition-frame-length 8 --sample-name long \ --prompt '|0|a white jeep equipped with a roof rack driving on a dirt road in a coniferous forest.|2|a white jeep equipped with a roof rack driving on a dirt road in the desert.|4|a white jeep equipped with a roof rack driving on a dirt road in a mountain.|6|A white jeep equipped with a roof rack driving on a dirt road in a city.|8|a white jeep equipped with a roof rack driving on a dirt road on the surface of a river.|10|a white jeep equipped with a roof rack driving on a dirt road under the lake.|12|a white jeep equipped with a roof rack flying into the sky.|14|a white jeep equipped with a roof rack driving in the universe. Earth is the background.{"reference_path": "https://cdn.openai.com/tmp/s/interp/d0.mp4", "mask_strategy": "0,0,0,0,16"}'

Question 1:

What is the meaning of the parameters below?

--num-frames --loop --condition-frame-length

Question 2:

|0|,|2|,|4|...are the id of texts? if yes, why they are not like |0|,|1|,|2|...

Question 3:

What is the function of the snippet:

{"reference_path": "https://cdn.openai.com/tmp/s/interp/d0.mp4", "mask_strategy": "0,0,0,0,16"}

Thanks a lot!

zhengzangw commented 2 weeks ago

Thank you for your question.

  1. For each arguments: 1.1 --num-frames means the length of your generation. In OpenSora 1.2, we provide a more user-friendly way: you can type in 2s, 4s, 8s to specify the length. The calculation is: under fps 24, 51 frames have 51/24=2.125s length. 1.2 --loop is a strategy we introduce to generate very long videos. It will use a part of the last generated video as condition to generate the next video. So if the loop is 10, and --num-frames is 2s, then the final output is approximately 20s. 1.3 The --condition-frame-length means how many frames you are using to generate the next video. It works only if loop>1.

  2. Yes, but the id is for the loop. |x| means the texts will be used for xth generation in the loop, and if the next is |x+i|, loop id within |x| and |x+i| will all use the prompt with |x|. This is designed for ease of use.

  3. The snippet is a trick to pass info in text. It is equivalent to --reference-path https://cdn.openai.com/tmp/s/interp/d0.mp4 and --mask-strategy "0,0,0,0,16". The meaning is well-documented here

reich208github commented 2 weeks ago

Thank you for your question.

  1. For each arguments: 1.1 --num-frames means the length of your generation. In OpenSora 1.2, we provide a more user-friendly way: you can type in 2s, 4s, 8s to specify the length. The calculation is: under fps 24, 51 frames have 51/24=2.125s length. 1.2 --loop is a strategy we introduce to generate very long videos. It will use a part of the last generated video as condition to generate the next video. So if the loop is 10, and --num-frames is 2s, then the final output is approximately 20s. 1.3 The --condition-frame-length means how many frames you are using to generate the next video. It works only if loop>1.
  2. Yes, but the id is for the loop. |x| means the texts will be used for xth generation in the loop, and if the next is |x+i|, loop id within |x| and |x+i| will all use the prompt with |x|. This is designed for ease of use.
  3. The snippet is a trick to pass info in text. It is equivalent to --reference-path https://cdn.openai.com/tmp/s/interp/d0.mp4 and --mask-strategy "0,0,0,0,16". The meaning is well-documented here

Ok, I got it, thanks!