OOM issue - Githubissues

JunhyeongDoyle commented 1 year ago

Hi! Thanks for sharing awesome work.

I'm trying to train a model with dynerf data, but I keep encountering an OOM issue.

( Adjusted down_sample and num_steps in config for initial dynerf training )

'save_outputs': True,
 'scene_bbox': [[-3.0, -1.8, -1.2], [3.0, 1.8, 1.2]],
 'scheduler_type': 'warmup_cosine',
 'single_jitter': False,
 'time_smoothness_weight': 0.001,
 'time_smoothness_weight_proposal_net': 1e-05,
 'train_fp16': True,
 'use_proposal_weight_anneal': True,
 'use_same_proposal_network': False,
 'valid_every': 30000}
2023-06-23 04:43:45,251|    INFO| Loading Video360Dataset with downsample=4.0
Loading train data: 100%|███████████████████████████████████████████████████████████████| 19/19 [00:41<00:00,  2.20s/it]2023-06-23 04:44:53,937|    INFO| Computed 1953572400 ISG weights in 24.48s.
killed

When checked with dmesg, the following error appeared:

[2916867.742639] oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=user.slice,mems_allowed=0-1,global_oom,task_memcg=/user.slice/user-1004.slice/session-1589.scope,task=python,pid=3555848,uid=1004
[2916867.742721] Out of memory: Killed process 3555848 (python) total-vm:167336276kB, anon-rss:123357620kB, file-rss:4kB, shmem-rss:8kB, UID:1004 pgtables:243752kB oom_score_adj:0
[2916871.967326] oom_reaper: reaped process 3555848 (python), now anon-rss:0kB, file-rss:0kB, shmem-rss:8kB

sarafridov commented 1 year ago

The preprocessing step for dynerf is pretty CPU-memory-intensive; I remember getting similar issues if I tried to run too many of these in parallel or without downsampling. I uploaded my precomputed sampling weights for some of the scenes (the .pt files here for salmon and in flamesteak_explicit and searsteak_explicit), so you can try downloading those weights into your data folders and then running the actual training step to see if it works. Note that the salmon scene is a bit more memory-intensive than the others.

JunhyeongDoyle commented 1 year ago

@sarafridov Thanks for sharing :)

JokerYan commented 1 year ago

thank you very much for the sharing. It is very helpful.

sarafridov / K-Planes

OOM issue #29