zju3dv / EasyMocap

Make human motion capture easier.
Other
3.48k stars 443 forks source link

Issue when train data for "Novel View Synthesis of Human Interactions From Sparse Multi-view Videos" #288

Closed edwardraymondhe closed 1 year ago

edwardraymondhe commented 1 year ago

I've got 1 GPU (RTX3060 Laptop), so tried the command mentioned in the below link.

Here's the environment: CUDA 11.1 Python 3.9 Followed insturctions at https://chingswy.github.io/easymocap-public-doc/works/multinb.html

Issue: Encounterred error when running this command line: python3 apps/neuralbody/demo.py --mode soccer1_6 ${data} --gpus 0, data_share_args.sample_args.nray

usage: This script helps you to motion capture from multiple views.

The following modes are supported:

    neuralbody          : [mv1p] NeuralBody
    aninerf             : [mv1p] Animatable NeRF
    neuralbody+back     : [mvmp] Neuralbody + background on ZJUMoCap datast
    neuralbody+bkgdimg  : [mvmp] Neuralbody with background images
    multineuralbody     : [mvmp] Neuralbody + background on ZJUMoCap datast
    soccer1             : [mvmp] Neuralbody + background in the wild
    soccer1_beijia      : Beijia in soccer scenes
    soccer1_yuang       : Beijia in soccer scenes
    boxing              : boxing dataset
    handstand           : handstand
    basketball          : basketball
    juggle              : juggle
    soccer1_6           : [mvmp] Neuralbody + background in the wild

demo.py: error: unrecognized arguments: data_share_args.sample_args.nray

After I encounterred the above error, I tried directly using this command: python3 apps/neuralbody/demo.py --mode soccer1_6 ${data}

config/neuralbody_index_mnb.yml
[Config] merge from parent file: config/neuralbody/utils/train_base.yml
[Config] merge from parent file: config/neuralbody/dataset/demo_soccer1_6.yml
[Config] merge from parent file: config/neuralbody/dataset/neuralbody_soccer.yml
[Config] merge from parent file: config/neuralbody/dataset/neuralbody_custom.yml
[Config] merge from parent file: config/neuralbody/network/comp_neuralbody+back+soccer.yml
[Config] merge from parent file: config/neuralbody/utils/trainer_soccer.yml
[Config] merge from parent file: config/neuralbody/utils/trainer_base.yml
[Config] merge from parent file: config/neuralbody/utils/vis_multi.yml
python3 apps/neuralbody/train_pl.py --cfg neuralbody/soccer1_6/config.yml gpus 0, distributed True exp soccer1_6
[run] python3 apps/neuralbody/train_pl.py --cfg neuralbody/soccer1_6/config.yml gpus 0, distributed True exp soccer1_6
Global seed set to 666
[model] background     :  0.6M
[model] ground         :  0.6M
- [Load Network](Neuralbody) use_viewdirs=False, use_canonical_viewdirs=True
[model] human_0        :  4.6M
- [Load Network](Neuralbody) use_viewdirs=False, use_canonical_viewdirs=True
[model] human_1        :  4.6M
- [Load Network](Neuralbody) use_viewdirs=False, use_canonical_viewdirs=True
[model] human_2        :  4.6M
- [Load Network](Neuralbody) use_viewdirs=False, use_canonical_viewdirs=True
[model] human_3        :  4.6M
- [Load Network](Neuralbody) use_viewdirs=False, use_canonical_viewdirs=True
[model] human_4        :  4.6M
- [Load Network](Neuralbody) use_viewdirs=False, use_canonical_viewdirs=True
[model] human_5        :  4.6M
[model] ball_0         :  0.6M
check all the data:   0%|                                                                               | 0/1600 [00:00<?, ?it/s]using mask of background
check all the data: 100%|███████████████████████████████████████████████████████████████████| 1600/1600 [00:10<00:00, 155.19it/s]
the results are saved at none
GPU available: True, used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
Global seed set to 666
initializing distributed: GLOBAL_RANK: 0, MEMBER: 1/1
----------------------------------------------------------------------------------------------------
distributed_backend=nccl
All distributed processes registered. Starting with 1 processes
----------------------------------------------------------------------------------------------------

LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]

  | Name           | Type          | Params
-------------------------------------------------
0 | network        | ComposedModel | 29.6 M
1 | train_renderer | RenderWrapper | 29.6 M
-------------------------------------------------
29.6 M    Trainable params
0         Non-trainable params
29.6 M    Total params
118.422   Total estimated model params size (MB)
Epoch 0:   0%|                                                                                          | 0/1000 [00:00<?, ?it/s]/home/edward/anaconda3/envs/easymocap/lib/python3.9/site-packages/pytorch_lightning/utilities/data.py:56: UserWarning: Trying to infer the `batch_size` from an ambiguous collection. The batch size we found is 1. To avoid any miscalculations, use `self.log(..., batch_size=batch_size)`.
  warning_cache.warn(
Traceback (most recent call last):
  File "/home/edward/Desktop/NeRF/EasyMocap/apps/neuralbody/train_pl.py", line 230, in <module>
    train(cfg)
  File "/home/edward/Desktop/NeRF/EasyMocap/apps/neuralbody/train_pl.py", line 128, in train
    trainer.fit(model)
  File "/home/edward/anaconda3/envs/easymocap/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 735, in fit
    self._call_and_handle_interrupt(
  File "/home/edward/anaconda3/envs/easymocap/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 682, in _call_and_handle_interrupt
    return trainer_fn(*args, **kwargs)
  File "/home/edward/anaconda3/envs/easymocap/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 770, in _fit_impl
    self._run(model, ckpt_path=ckpt_path)
  File "/home/edward/anaconda3/envs/easymocap/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 1193, in _run
    self._dispatch()
  File "/home/edward/anaconda3/envs/easymocap/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 1272, in _dispatch
    self.training_type_plugin.start_training(self)
  File "/home/edward/anaconda3/envs/easymocap/lib/python3.9/site-packages/pytorch_lightning/plugins/training_type/training_type_plugin.py", line 202, in start_training
    self._results = trainer.run_stage()
  File "/home/edward/anaconda3/envs/easymocap/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 1282, in run_stage
    return self._run_train()
  File "/home/edward/anaconda3/envs/easymocap/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 1312, in _run_train
    self.fit_loop.run()
  File "/home/edward/anaconda3/envs/easymocap/lib/python3.9/site-packages/pytorch_lightning/loops/base.py", line 145, in run
    self.advance(*args, **kwargs)
  File "/home/edward/anaconda3/envs/easymocap/lib/python3.9/site-packages/pytorch_lightning/loops/fit_loop.py", line 234, in advance
    self.epoch_loop.run(data_fetcher)
  File "/home/edward/anaconda3/envs/easymocap/lib/python3.9/site-packages/pytorch_lightning/loops/base.py", line 145, in run
    self.advance(*args, **kwargs)
  File "/home/edward/anaconda3/envs/easymocap/lib/python3.9/site-packages/pytorch_lightning/loops/epoch/training_epoch_loop.py", line 195, in advance
    batch_output = self.batch_loop.run(batch, batch_idx)
  File "/home/edward/anaconda3/envs/easymocap/lib/python3.9/site-packages/pytorch_lightning/loops/base.py", line 145, in run
    self.advance(*args, **kwargs)
  File "/home/edward/anaconda3/envs/easymocap/lib/python3.9/site-packages/pytorch_lightning/loops/batch/training_batch_loop.py", line 88, in advance
    outputs = self.optimizer_loop.run(split_batch, optimizers, batch_idx)
  File "/home/edward/anaconda3/envs/easymocap/lib/python3.9/site-packages/pytorch_lightning/loops/base.py", line 145, in run
    self.advance(*args, **kwargs)
  File "/home/edward/anaconda3/envs/easymocap/lib/python3.9/site-packages/pytorch_lightning/loops/optimization/optimizer_loop.py", line 215, in advance
    result = self._run_optimization(
  File "/home/edward/anaconda3/envs/easymocap/lib/python3.9/site-packages/pytorch_lightning/loops/optimization/optimizer_loop.py", line 266, in _run_optimization
    self._optimizer_step(optimizer, opt_idx, batch_idx, closure)
  File "/home/edward/anaconda3/envs/easymocap/lib/python3.9/site-packages/pytorch_lightning/loops/optimization/optimizer_loop.py", line 378, in _optimizer_step
    lightning_module.optimizer_step(
  File "/home/edward/anaconda3/envs/easymocap/lib/python3.9/site-packages/pytorch_lightning/core/lightning.py", line 1662, in optimizer_step
    optimizer.step(closure=optimizer_closure)
  File "/home/edward/anaconda3/envs/easymocap/lib/python3.9/site-packages/pytorch_lightning/core/optimizer.py", line 164, in step
    trainer.accelerator.optimizer_step(self._optimizer, self._optimizer_idx, closure, **kwargs)
  File "/home/edward/anaconda3/envs/easymocap/lib/python3.9/site-packages/pytorch_lightning/accelerators/accelerator.py", line 336, in optimizer_step
    self.precision_plugin.optimizer_step(model, optimizer, opt_idx, closure, **kwargs)
  File "/home/edward/anaconda3/envs/easymocap/lib/python3.9/site-packages/pytorch_lightning/plugins/precision/precision_plugin.py", line 163, in optimizer_step
    optimizer.step(closure=closure, **kwargs)
  File "/home/edward/anaconda3/envs/easymocap/lib/python3.9/site-packages/torch/optim/lr_scheduler.py", line 65, in wrapper
    return wrapped(*args, **kwargs)
  File "/home/edward/anaconda3/envs/easymocap/lib/python3.9/site-packages/torch/optim/optimizer.py", line 88, in wrapper
    return func(*args, **kwargs)
  File "/home/edward/anaconda3/envs/easymocap/lib/python3.9/site-packages/torch/autograd/grad_mode.py", line 28, in decorate_context
    return func(*args, **kwargs)
  File "/home/edward/anaconda3/envs/easymocap/lib/python3.9/site-packages/torch/optim/adam.py", line 66, in step
    loss = closure()
  File "/home/edward/anaconda3/envs/easymocap/lib/python3.9/site-packages/pytorch_lightning/plugins/precision/precision_plugin.py", line 148, in _wrap_closure
    closure_result = closure()
  File "/home/edward/anaconda3/envs/easymocap/lib/python3.9/site-packages/pytorch_lightning/loops/optimization/optimizer_loop.py", line 160, in __call__
    self._result = self.closure(*args, **kwargs)
  File "/home/edward/anaconda3/envs/easymocap/lib/python3.9/site-packages/pytorch_lightning/loops/optimization/optimizer_loop.py", line 142, in closure
    step_output = self._step_fn()
  File "/home/edward/anaconda3/envs/easymocap/lib/python3.9/site-packages/pytorch_lightning/loops/optimization/optimizer_loop.py", line 435, in _training_step
    training_step_output = self.trainer.accelerator.training_step(step_kwargs)
  File "/home/edward/anaconda3/envs/easymocap/lib/python3.9/site-packages/pytorch_lightning/accelerators/accelerator.py", line 216, in training_step
    return self.training_type_plugin.training_step(*step_kwargs.values())
  File "/home/edward/anaconda3/envs/easymocap/lib/python3.9/site-packages/pytorch_lightning/plugins/training_type/ddp.py", line 441, in training_step
    return self.model(*args, **kwargs)
  File "/home/edward/anaconda3/envs/easymocap/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/edward/anaconda3/envs/easymocap/lib/python3.9/site-packages/torch/nn/parallel/distributed.py", line 799, in forward
    output = self.module(*inputs[0], **kwargs[0])
  File "/home/edward/anaconda3/envs/easymocap/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/edward/anaconda3/envs/easymocap/lib/python3.9/site-packages/pytorch_lightning/overrides/base.py", line 81, in forward
    output = self.module.training_step(*inputs, **kwargs)
  File "/home/edward/Desktop/NeRF/EasyMocap/apps/neuralbody/train_pl.py", line 53, in training_step
    output, loss, loss_stats, image_stats = self.train_renderer(batch)
  File "/home/edward/anaconda3/envs/easymocap/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/edward/Desktop/NeRF/EasyMocap/easymocap/neuralbody/renderer/render_wrapper.py", line 26, in forward
    ret = self.renderer(batch)
  File "/home/edward/anaconda3/envs/easymocap/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/edward/Desktop/NeRF/EasyMocap/easymocap/neuralbody/renderer/render_base.py", line 358, in forward
    results = self.forward_multi(batch, rand_bkgd)
  File "/home/edward/Desktop/NeRF/EasyMocap/easymocap/neuralbody/renderer/render_base.py", line 337, in forward_multi
    ret = self.batch_forward(batch, viewdir, start, end, bkgd)
  File "/home/edward/Desktop/NeRF/EasyMocap/easymocap/neuralbody/renderer/render_base.py", line 218, in batch_forward
    z_vals, pts, raw_output = model.calculate_density_color_from_ray(
  File "/home/edward/Desktop/NeRF/EasyMocap/easymocap/neuralbody/model/base.py", line 203, in calculate_density_color_from_ray
    raw_output = self.calculate_density_color(pts, viewdirs_)
  File "/home/edward/Desktop/NeRF/EasyMocap/easymocap/neuralbody/model/neuralbody.py", line 281, in calculate_density_color
    xyzc_features = interpolate_features(grid_coords, sparse_feature['feature_volume'], self.padding_mode)
  File "/home/edward/Desktop/NeRF/EasyMocap/easymocap/neuralbody/model/neuralbody.py", line 120, in interpolate_features
    features = torch.cat(features, dim=1)
RuntimeError: CUDA out of memory. Tried to allocate 242.00 MiB (GPU 0; 5.80 GiB total capacity; 4.10 GiB already allocated; 187.19 MiB free; 4.15 GiB reserved in total by PyTorch)
Epoch 0:   0%|          | 0/1000 [00:01<?, ?it/s]
chingswy commented 1 year ago

Append this:

data_share_args.sample_args.nrays 1024

Replace 1024 with a smaller number if OOM occurs.

edwardraymondhe commented 1 year ago

Thanks for reply :) But I think this might not be the problem with the numbers... Maybe it's the command that has some errors in the argument?

(easymocap) edward@edward:~/Desktop/NeRF/EasyMocap$ python3 apps/neuralbody/demo.py --mode soccer1_6 ${data} --gpus 0, data_share_args.sample_args.nrays 1024
config/neuralbody_index_mnb.yml
usage: This script helps you to motion capture from multiple views.

    The following modes are supported:
    neuralbody          : [mv1p] NeuralBody
    aninerf             : [mv1p] Animatable NeRF
    neuralbody+back     : [mvmp] Neuralbody + background on ZJUMoCap datast
    neuralbody+bkgdimg  : [mvmp] Neuralbody with background images
    multineuralbody     : [mvmp] Neuralbody + background on ZJUMoCap datast
    soccer1             : [mvmp] Neuralbody + background in the wild
    soccer1_beijia      : Beijia in soccer scenes
    soccer1_yuang       : Beijia in soccer scenes
    boxing              : boxing dataset
    handstand           : handstand
    basketball          : basketball
    juggle              : juggle
    soccer1_6           : [mvmp] Neuralbody + background in the wild
demo.py: error: unrecognized arguments: data_share_args.sample_args.nrays 1024
(easymocap) edward@edward:~/Desktop/NeRF/EasyMocap$ python3 apps/neuralbody/demo.py --mode soccer1_6 ${data} --gpus 0, data_share_args.sample_args.nrays 512
config/neuralbody_index_mnb.yml
usage: This script helps you to motion capture from multiple views.

    The following modes are supported:
    neuralbody          : [mv1p] NeuralBody
    aninerf             : [mv1p] Animatable NeRF
    neuralbody+back     : [mvmp] Neuralbody + background on ZJUMoCap datast
    neuralbody+bkgdimg  : [mvmp] Neuralbody with background images
    multineuralbody     : [mvmp] Neuralbody + background on ZJUMoCap datast
    soccer1             : [mvmp] Neuralbody + background in the wild
    soccer1_beijia      : Beijia in soccer scenes
    soccer1_yuang       : Beijia in soccer scenes
    boxing              : boxing dataset
    handstand           : handstand
    basketball          : basketball
    juggle              : juggle
    soccer1_6           : [mvmp] Neuralbody + background in the wild
demo.py: error: unrecognized arguments: data_share_args.sample_args.nrays 512

Append this:

data_share_args.sample_args.nrays 1024

Replace 1024 with a smaller number if OOM occurs.

chingswy commented 1 year ago

Try this

python3 apps/neuralbody/demo.py --mode soccer1_6 ${data} --gpus 0, --opts data_share_args.sample_args.nrays 1024
edwardraymondhe commented 1 year ago

Thanks! It solved my problem :) Although now there's only the "CUDA out of memory" issue, but I think I can solve it somewhere else.

Awesome support, thanks!