princeton-vl / infinigen

Infinite Photorealistic Worlds using Procedural Generation
https://infinigen.org
BSD 3-Clause "New" or "Revised" License
5.32k stars 455 forks source link

Crash When Generating Multiple Rooms #333

Open GavinZhengOI opened 1 week ago

GavinZhengOI commented 1 week ago

Describe the bug

As I mentioned in #326 , Infinigen crashed while generating an indoor scene with multiple rooms containing objects.

Steps to Reproduce

python -m infinigen.datagen.manage_jobs --output_folder outputs/huge_dataset_nooverride \
--num_scenes 5 --pipeline_configs local_256GB.gin monocular.gin blender_gt.gin indoor_background_configs.gin \ 
--pipeline_overrides get_cmd.driver_script='infinigen_examples.generate_indoors' manage_datagen_jobs.num_concurrent=16

This will run tens of hours and then crash.

What version of the code were you using?

v1.8.1

commit 88fb49cde0bbca401601d05d672b31d28e9b45cb (HEAD -> main, origin/main, origin/HEAD)
Merge: 126a41eb ef50641f
Author: Alex Raistrick <araistrick@princeton.edu>
Date:   Fri Aug 23 13:01:23 2024 -0400

    Merge pull request #252 from princeton-vl/develop

    v1.8.1

What command did you run?

python -m infinigen.datagen.manage_jobs --output_folder outputs/huge_dataset_nooverride \
--num_scenes 5 --pipeline_configs local_256GB.gin monocular.gin blender_gt.gin indoor_background_configs.gin \ 
--pipeline_overrides get_cmd.driver_script='infinigen_examples.generate_indoors' manage_datagen_jobs.num_concurrent=16

What are your FULL output logs?

crash_summaries.txt

First Sample

92139964677_0_log.out.txt 92139964677_0_log.err.txt

Second Sample

80530831448_0_log.out.txt 80530831448_0_log.err.txt

If this is your first time running Infinigen, what are the full install logs?**

Not the first time running. If turn on --configs singleroom.gin and --overrides compose_indoors.restrict_single_supported_roomtype=True, it does not lead to a crash.

Platform

Additional context

I suspect this might be a memory leak issue. I noticed that swap usage was over 85%, while memory utilization was only around 50%. I think it's possible that memory is leaking during execution and eventually consuming all available memory. I've just installed Prometheus on my system and started another execution with the same configuration to monitor memory usage. I'll update you once I have the results.

GavinZhengOI commented 1 week ago

CleanShot 2024-09-23 at 09 25 42@2x It seems like there are some kind of memory leak. I'm using python -m infinigen.datagen.manage_jobs --output_folder outputs/single_data_test --num_scenes 1 --pipeline_configs local_64GB.gin monocular.gin blender_gt.gin indoor_background_configs.gin --pipeline_overrides get_cmd.driver_script='infinigen_examples.generate_indoors' manage_datagen_jobs.num_concurrent=16 running single job, and it takes more than 55G of ram gradually then crashed.