princeton-vl / infinigen

Infinite Photorealistic Worlds using Procedural Generation
https://infinigen.org
BSD 3-Clause "New" or "Revised" License
5.14k stars 430 forks source link

ConfiguringInfinigen Setting Questions #232

Closed TomTomTommi closed 1 month ago

TomTomTommi commented 1 month ago

Hi, I want to generate a large scale of diversified stereo videos on separate A100(80GB) machines. My command is currently as follows on:

python -m infinigen.datagen.manage_jobs --output_folder outputs/my_videos --num_scenes 500 \
    --pipeline_config local_256GB.gin stereo_video cuda_terrain blender_gt \
    --cleanup big_files --warmup_sec 60000 --config video high_quality_terrain \
    --overrides compose_scene.rain_particles_chance=0.2 compose_scene.leaf_particles_chance=0.1 compose_scene.camera_based_lighting_chance=0.2
    --pipeline_override  LocalScheduleHandler.jobs_per_gpu=3

1) How to ensure that the same data is not generated? I read the guideline of ConfiguringInfinigen.md, it says: "--specific_seed 0 forces the system to use a random seed of your choice, rather than choosing one at random. Change this seed to get a different random variation, or remove it to have the program choose a seed at random --num_scenes decides how many unique scenes the program will attempt to generate before terminating. Once you have removed --specific_seed, you can increase this to generate many scenes in sequence or in parallel."

It seems that I don't need to modify it. But can running this command on multiple machines avoid to generate the same seed?

2) I also want to generate longer videos, as Issue 211. To use 30 fps, it this command correct in .gin file?

iterate_scene_tasks.frame_range = [1, 300]
iterate_scene_tasks.view_block_size = 300
iterate_scene_tasks.cam_block_size = 10
iterate_scene_tasks.cam_id_ranges = [1, 2]

and how to slow down the camera moving to avoid crashes?

3) Is that possible to ensure each scene has non-rigid objects like animals? If so, how to set this?

araistrick commented 1 month ago

RE question 1 - "how can I ensure different machines generate different scenes"?

RE question 2 - "how can I configure the system to run at 30fps":

RE question 2.5 - "How to slow down the camera moving to avoid crashes"?:

RE question 3 - how can I increase the prevalence of non-rigid objects?

TomTomTommi commented 1 month ago

Thanks so much for the reply! Just to confirm, iterate_scene_tasks.frame_range = [1, 300] would generate frames within this range, right? If I want a full 300 frames videos, can I set iterate_scene_tasks.frame_range = [299, 300] ? I used to try iterate_scene_tasks.frame_range = [1, 192] but never obtain videos of 192 frames. It often produces 24/48 frames.

araistrick commented 1 month ago

The frame_range specifies the start and end frame of the video. if you say iterate_scene_tasks.frame_range = [299, 300] the system will only render frames 299-300, so only 2 frames total. If you want 300 frame videos, you would technically need to say [1, 301].

If you requested 192 frames and only got 24/48 then either the script is still running or something has crashed. what is the output of manage_jobs for this case? also consider sending the scenes_db.csv and crash_summaries.txt

TomTomTommi commented 1 month ago

Hi, when I am using this command on a RTX6000 machine with 128GB RAM

python -m infinigen.datagen.manage_jobs --output_folder /rds/general/ephemeral/user/jj323/ephemeral/output/stereo_videos --num_scenes 10 \
    --pipeline_config local_128GB.gin stereo_video cuda_terrain blender_gt \
    --cleanup big_files --warmup_sec 60000 --config video high_quality_terrain \
    --overrides compose_scene.rain_particles_chance=0.2 compose_scene.leaf_particles_chance=0.1 compose_scene.camera_based_lighting_chance=0.2

The output is quite wired and I fail to locate the error:

/rds/general/ephemeral/user/jj323/ephemeral/output/stereo_videos 04/29 05:27PM -> 05/02 09:39AM
Restricting to gpus_uuids=set() due to toplevel CUDA_VISIBLE_DEVICES setting
============================================================
control_state/curr_concurrent_max : 8
control_state/disk_usage       : 0.91
control_state/n_in_flight      : 1
control_state/try_to_launch    : 7
control_state/will_launch      : 0
queued/fineterrain             : 1
queued/total                   : 1
succeeded/coarse               : 1

The log.out file is here. 64484032748_0_log_out.txt The error message suggests that there are memory blocks that haven't been freed properly, leading to memory leaks.

According to issue31 , this seems normal. But I can only get /coarse folder, without fine and frames folders. The message also indicates that the CUDA_VISIBLE_DEVICES environment variable is set at the top level, restricting GPU access to an empty set.

araistrick commented 1 month ago

queued/fine_terrain means your system wants to launch a fineterrain job but is unable to proceed. this is likely because of your CUDA_VISIBLE_DEVICES setting which seems to be set to an empty string. either set CUDA_VISIBLE_DEVICES=0 or unset the variable altogether. I did mess with CUDA_VISIBLE_DEVICES recently so its possible this is my bug instead actually. Or if you dont want cuda fineterrain, remove cuda_terrain from your command

TomTomTommi commented 1 month ago

I think the problem is the same as this issue. Removing the int(..) from int(s.strip()) fails to solve the problem, so I keep commenting it.

TomTomTommi commented 1 month ago

Plus, I see "iterate_scene_tasks.cam_block_size controls how many frames will be grouped into each fine_terrain and render / ground-truth task." What if I increase this? Would the CPU or GPU cost be strongly improved?

araistrick commented 1 month ago

I see, apologies, i will try to patch the CUDA_VISIBLE_DEVICES thing over the weekend.

cam_block_size only affects grouping for render/ground_truth, view_block_size does fineterrain and is usually a larger block size or just the whole video.

Using a small block size means that one video can be processed simultaneously by many GPUs, which decreases latency to get back 1 video. However, if you use a large block size (potentially just block_size=video_len) then the latency will be much higher (since 1 GPU has to render the whole video in series) but technically the overall throughput will be higher as well (provided all your GPUs are still well-utilized). This is because there is some constant startup cost to load the assets from disk and onto the GPU and build a raycasting datastructure, so the more you render on a single GPU the more you amortize this cost.

Another practical point is that you should never set cam_block_size to be larger than (video_length // n_gpus_available). IE if you only have one GPU then splitting rendering up into multiple blocks doesnt really do anything, since the blocks will all be processed in serial anyways.

Heres a projection I made of the cam_block_size vs throughput tradeoff. This is based on assuming the time-per-frame and the startup cost are constant, then amortizing the startup cost over a variable number of frames.

image

In practice I use cam_block_size=8 still, since its mostly as fast and I value the reduced latency (since I can see results quicker, and use less disk space on storing scenes for many in-flight render jobs)

TomTomTommi commented 1 month ago

Thanks a lot for your detailed reply. I observed a little little bit of flickering in the pre-generated videos. I wonder if this is the effect of cam_block_size setting.

araistrick commented 1 month ago

The pre-generated videos were made before we integrated OcMesher, which fixes all mesh flickering issues. The current codebase wont generate any mesh flickering in videos. I have lots of videos rendered with this new version but I havent had a chance to package them for release.

TomTomTommi commented 1 month ago

Get it! Following all the advice above, I generated some demo videos. I found that the tigers in the video are static without moving. But some flies are moving. Is that possible to make the animals move?

araistrick commented 1 month ago

you can try setting animation_mode='idle' or animation_mode='run' on the various CreatureFactory classes that support it. 'idle' works quite well but 'run' is goofy, the creatures run in place and dont traverse through the scene. Youd have to modify both generate_nature and execute_tasks I believe. But also be aware that hair+animation at the same time is currently broken.

TomTomTommi commented 1 month ago

Thanks. I will try some demos to see the results. BTW, if I want to set the baseline of stereo cameras in a certain range, is camera.py the file I should modify?

araistrick commented 1 month ago

Theres a setting in base.gin to set the translation