henzler / neuraltexture

Learning a Neural 3D Texture Space from 2D Exemplars [CVPR 2020]
MIT License
107 stars 20 forks source link

Training Fails on Validation Sanity Check #5

Open madhawav opened 3 years ago

madhawav commented 3 years ago

Hi, When I try to train on a new dataset, it fails with the following error.

[PYTHON_ENV_PATH]/neuraltexture/bin/python -u [PROJECT_ROOT]/code/train_neural_texture.py
[PYTHON_ENV_PATH]/lib/python3.8/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:541: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_qint8 = np.dtype([("qint8", np.int8, 1)])
[PYTHON_ENV_PATH]/lib/python3.8/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:542: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_quint8 = np.dtype([("quint8", np.uint8, 1)])
[PYTHON_ENV_PATH]/lib/python3.8/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:543: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_qint16 = np.dtype([("qint16", np.int16, 1)])
[PYTHON_ENV_PATH]/lib/python3.8/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:544: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_quint16 = np.dtype([("quint16", np.uint16, 1)])
[PYTHON_ENV_PATH]/lib/python3.8/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:545: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_qint32 = np.dtype([("qint32", np.int32, 1)])
[PYTHON_ENV_PATH]/lib/python3.8/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:550: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  np_resource = np.dtype([("resource", np.ubyte, 1)])
Use pytorch 1.4.0
Load config: configs/neural_texture/config_default.yaml
INFO:lightning:GPU available: True, used: True
[PYTHON_ENV_PATH]/lib/python3.8/site-packages/pytorch_lightning/utilities/distributed.py:23: RuntimeWarning: You have defined a `val_dataloader()` and have defined a `validation_step()`, you may also want to define `validation_epoch_end()` for accumulating stats.
  warnings.warn(*args, **kwargs)
[PYTHON_ENV_PATH]/lib/python3.8/site-packages/pytorch_lightning/utilities/distributed.py:23: RuntimeWarning: You have defined a `test_dataloader()` and have defined a `test_step()`, you may also want to define `test_epoch_end()` for accumulating stats.
  warnings.warn(*args, **kwargs)
Validation sanity check: 0it [00:00, ?it/s]Traceback (most recent call last):
  File "[PROJECT_ROOT]/neuraltexture/code/train_neural_texture.py", line 47, in <module>
  File "[PYTHON_ENV_PATH]/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 765, in fit
  File "[PYTHON_ENV_PATH]/lib/python3.8/site-packages/pytorch_lightning/trainer/distrib_parts.py", line 492, in single_gpu_train
  File "[PYTHON_ENV_PATH]/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 896, in run_pretrain_routine
    eval_results = self._evaluate(model,
  File "[PYTHON_ENV_PATH]/lib/python3.8/site-packages/pytorch_lightning/trainer/evaluation_loop.py", line 322, in _evaluate
    eval_results = model.validation_end(outputs)
  File "[PROJECT_ROOT]/neuraltexture/code/systems/s_core.py", line 33, in validation_end
    for key in outputs[0].keys():
IndexError: list index out of range

Process finished with exit code 1

Additional Information

My "config_default.yml" Is shown below:

version_name: neuraltexture_all_2d_single
device: cuda
n_workers: 8
n_gpus: 1
dim: 2
  octaves: 8
  log_files_every_n_iter: 1000
  log_scalars_every_n_iter: 100
  log_validation_every_n_epochs: 1
  image_res: &image_res 128 # (height, width)
  e: &texture_e 64 # encoding size
  name: datasets.images
  path: '../datasets/all'
  use_single: -1 # -1 = all, 0,1,2 for single
        name: models.neural_texture.encoder
        type: 'ResNet'
        shape_in:  [[3, *image_res, *image_res]]
        bottleneck_size: 8
        name: models.neural_texture.mlp
        type: 'MLP'
        n_max_features: 128
        n_blocks: 4
        dropout_ratio: 0.0
        non_linearity: 'relu'
        bias: True
        encoding: *texture_e
      name: 'adam'
      lr: 0.0001
      weight_decay: 0.0001
      name: 'none'
      style_weight: 1.0
      style_type: 'mse'
  epochs: 3
  bs: 16
  accumulate_grad_batches: 1
  seed: 41127

Your help is much appreciated.

PierrickCh commented 1 year ago

Had the same issue, tweaked the code a bit to:

if len(outputs)>0:
            for key in outputs[0].keys():
                logs[key] = torch.stack([x[key] for x in outputs]).mean()

This is very ad hoc, i think the code needs a 'val' folder as well as a train and test