universome / epigraf

[NeurIPS 2022] Official pytorch implementation of EpiGRAF
https://universome.github.io/epigraf
150 stars 6 forks source link

Zero dimension of c for the mapping network #7

Closed eldar closed 2 years ago

eldar commented 2 years ago

Thank you for releasing the code for your awesome paper! I am trying to reproduce the results on the Plants dataset and get the following error message:

Traceback (most recent call last):
  File "/users/eldar/src/epigraf/src/train.py", line 291, in main
    launch_training(c=c, outdir=cfg.experiment_dir, dry_run=opts.dry_run)
  File "/users/eldar/src/epigraf/src/train.py", line 109, in launch_training
    subprocess_fn(rank=0, c=c, temp_dir=temp_dir)
  File "/users/eldar/src/epigraf/src/train.py", line 52, in subprocess_fn
    training_loop.training_loop(rank=rank, **c)
  File "/users/eldar/src/epigraf/src/training/training_loop.py", line 234, in training_loop
    images = torch.cat([G_ema(z=z, c=c, camera_angles=a, noise_mode='const').cpu() for z, c, a in zip(vis.grid_z, vis.grid_c, vis.grid_camera_angles)]).numpy()
  File "/users/eldar/src/epigraf/src/training/training_loop.py", line 234, in <listcomp>
    images = torch.cat([G_ema(z=z, c=c, camera_angles=a, noise_mode='const').cpu() for z, c, a in zip(vis.grid_z, vis.grid_c, vis.grid_camera_angles)]).numpy()
  File "/users/eldar/src/epigraf/env/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "/users/eldar/src/epigraf/src/training/networks_epigraf.py", line 494, in forward
    ws = self.mapping(z, c, camera_angles=camera_angles_cond, truncation_psi=truncation_psi, truncation_cutoff=truncation_cutoff, update_emas=update_emas)
  File "/users/eldar/src/epigraf/env/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "/users/eldar/src/epigraf/src/training/layers.py", line 148, in forward
    misc.assert_shape(c, [None, self.c_dim])
  File "/users/eldar/src/epigraf/src/torch_utils/misc.py", line 95, in assert_shape
    raise AssertionError(f'Wrong size for dimension {idx}: got {size}, expected {ref_size}')
AssertionError: Wrong size for dimension 1: got 0, expected 191

The problem seems to be in the training_loop.py where at test time it passes argument c into the mapping network with the shape (4, 0) instead of presumably (4, 191)

I launch training with the following command:

python src/infra/launch.py hydra.run.dir=. exp_suffix=plants dataset=megascans_plants dataset.resolution=256 training.gamma=0.05 num_gpus=1 +ignore_uncommited_changes=true

Update: even if I comment out this part https://github.com/universome/epigraf/blob/main/src/training/training_loop.py#L232-L236 it still fails further down https://github.com/universome/epigraf/blob/main/src/training/training_loop.py#L323-L334 with the same error, so it is not just a problem at test time.

Update 2: looks like for the Megascans datasets you need to additionally set training.use_labels=true. Will close the issue once I can run the training!