Weird errors when training and extracting mesh

morsingher commented 2 years ago

Hi, I'm not sure this repo is still maintained, but I'll try. I get two weird errors both when I try to extract the mesh and when I train a new model. More specifically:

1) The following command: python train_nerf.py --config ../config/nerf-synthetic-lego.yml produces the following output:

Logger initiated...
Current log dir ../logs/nerf-synthetic-lego/default/version_2
Traceback (most recent call last):
  File "/localhome/c-morsingher/nerfmeshes/src/train_nerf.py", line 108, in <module>
    main()
  File "/localhome/c-morsingher/nerfmeshes/src/train_nerf.py", line 62, in main
    model = getattr(models, cfg.experiment.model)(cfg)
  File "/localhome/c-morsingher/nerfmeshes/src/models/model_nerf.py", line 25, in __init__
    super(NeRFModel, self).__init__(cfg, *args, **kwargs)
  File "/localhome/c-morsingher/nerfmeshes/src/models/model_base.py", line 21, in __init__
    self.hparams = flatten_dict(cfg, sep=".")
  File "/localhome/c-morsingher/anaconda3/envs/nerf/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1225, in __setattr__
    object.__setattr__(self, name, value)
AttributeError: can't set attribute

2) The following command: python mesh_nerf.py --log-checkpoint ../pretrained/colab-lego-nerf-high-res/default/version_0/ --checkpoint model_last.ckpt --save-dir ../data/meshes --limit 1.2 --res 480 --iso-level 32 --view-disparity-max-bound 1e0 produces the following output:

Current log dir ../pretrained/colab-lego-nerf-high-res/default/version_0
Loading model from ../pretrained/colab-lego-nerf-high-res/default/version_0/checkpoints/model_last.ckpt
Traceback (most recent call last):
  File "/localhome/c-morsingher/nerfmeshes/src/mesh_nerf.py", line 278, in <module>
    model = getattr(models, cfg.experiment.model).load_from_checkpoint(path_parser.checkpoint_path)
  File "/localhome/c-morsingher/anaconda3/envs/nerf/lib/python3.9/site-packages/pytorch_lightning/core/saving.py", line 156, in load_from_checkpoint
    model = cls._load_model_state(checkpoint, strict=strict, **kwargs)
  File "/localhome/c-morsingher/anaconda3/envs/nerf/lib/python3.9/site-packages/pytorch_lightning/core/saving.py", line 198, in _load_model_state
    model = cls(**_cls_kwargs)
TypeError: __init__() missing 1 required positional argument: 'cfg'

Any idea on why this happens and how to solve? Thank you in advance.

qway commented 2 years ago

1:

Can you run it in debug and tell me which name and value raises the error? seems like one of the cfg values is wrong.

2:

Honestly, i'm a bit stumped but it looks like a pytorch-lightning related error, have you made sure its the correct version?

morsingher commented 2 years ago

Hi @qway, thanks for the quick answer. You were right, apparently I messed up my conda environment and a fresh install of requirements.txt solved both issues. The LEGO mesh is really awesome!

I'm currently training another model on LEGO with default parameters, just to see if it matches your pretrained one. I have a pretty old GPU (a Tesla K80) and it's currenty taking 1.5s per iteration, thus resulting in several days for 250k iterations. Is this ok or is there something else I'm missing?

DomainFlag commented 2 years ago

@morsingher Yep, that's how it's supposed to be, but you can stop at 200k or so (usually it's fine), and see if the results suffice if not resume training.

morsingher commented 2 years ago

Thanks @DomainFlag, closing this as solved :)

qway / nerfmeshes

Weird errors when training and extracting mesh #11