omniobject3d / OmniObject3D

[ CVPR 2023 Award Candidate ] OmniObject3D: Large-Vocabulary 3D Object Dataset for Realistic Perception, Reconstruction and Generation
452 stars 12 forks source link

Questions about the DataLoader is killed by signal #14

Open AbrahamYabo opened 1 year ago

AbrahamYabo commented 1 year ago

When training pixel-nerf using Omniobject3D, the dataloader wiil be killed randomly. We have tried several ways to solve this problem by setting num_workers = 0, or reduce the batch size. But didn't solve the problem. We have mentioned that there are some bad instances within the dataset notice by https://github.com/omniobject3d/OmniObject3D/issues/6 Does that make the dataloader killed?

The following part shows the traceback.


  File "train/train.py", line 351, in <module>
    trainer.start()
  File "/home/ma-user/work/pixel-nerf-omni/train/trainlib/trainer.py", line 213, in start
    test_data, global_step=step_id
  File "train/train.py", line 283, in vis_step
    render_dict = DotMap(render_par(test_rays, want_weights=True))
  File "/home/ma-user/work/Pytorch-pixelnerf/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/ma-user/work/pixel-nerf-omni/src/render/nerf.py", line 30, in forward
    self.net, rays, want_weights=want_weights and not self.simple_output
  File "/home/ma-user/work/Pytorch-pixelnerf/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/ma-user/work/pixel-nerf-omni/src/render/nerf.py", line 275, in forward
    model, rays, z_coarse, coarse=True, sb=superbatch_size,
  File "/home/ma-user/work/pixel-nerf-omni/src/render/nerf.py", line 213, in composite
    val_all.append(model(pnts, coarse=coarse, viewdirs=dirs))
  File "/home/ma-user/work/Pytorch-pixelnerf/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/ma-user/work/pixel-nerf-omni/src/model/models.py", line 183, in forward
    z_feature = self.code(z_feature)
  File "/home/ma-user/work/Pytorch-pixelnerf/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/ma-user/work/pixel-nerf-omni/src/model/code.py", line 37, in forward
    embed = x.unsqueeze(1).repeat(1, self.num_freqs * 2, 1)
  File "/home/ma-user/work/Pytorch-pixelnerf/lib/python3.7/site-packages/torch/utils/data/_utils/signal_handling.py", line 66, in handler
    _error_if_any_worker_fails()
RuntimeError: DataLoader worker (pid 2087545) is killed by signal: Killed. ```
wutong16 commented 1 year ago

Hi. We tried to re-train on pixelnerf recently, but we have not encountered the issue you mentioned as of now. However, we did discover an additional file named tar.gz within the antique directory. While we cannot confirm if this file is the root cause of the problem you experienced, please delete it to determine if it has any impact on resolving the issue.

The data loader problem you encountered may not necessarily be linked to issue 6, as those issues with raw scans do not seem to affect the data format of blender renders, and we have fixed them. It would be more helpful if you could trace which file has directly caused the program to break, and we'd appreciate it if you could report it to help us fix that problem with data.