lifuguan / GP-NeRF

[CVPR 2024 Highlight] GP-NeRF: Generalized Perception NeRF for Context-Aware 3D Scene Understanding
https://lifuguan.github.io/gpnerf-pages
MIT License
26 stars 1 forks source link

Train on the Replica dataset without using depth loss. #1

Open mbjurca opened 7 months ago

mbjurca commented 7 months ago

Thank you for your contribution. I would like to train the model from scratch on the Replica dataset but without the depth-guided loss. I ran the following command: CUDA_VISIBLE_DEVICES=1 python train_gpnerf.py --config configs/gpnerf_replica.txt --expname debug --ckpt_path ./out/gnt_best.pth --no_load_opt --no_load_scheduler but, the follwing error occurs:

File "/home/mihnea/mihnea/GP-NeRF/train_gpnerf.py", line 507, in <module>
    train(args)
  File "/home/mihnea/mihnea/GP-NeRF/train_gpnerf.py", line 163, in train
    ray_sampler = RaySamplerSingleImage(train_data, device)
  File "/home/mihnea/mihnea/GP-NeRF/gpnerf/sample_ray.py", line 46, in __init__
    self.depth_mask = data["depth_mask"]
KeyError: 'depth_mask'

In the ReplicaDataset I don't see the depth_mask key in __getitem__. I added the field as in the ScanNet dataset but I get a new error:

File "/home/mihnea/mihnea/GP-NeRF/train_gpnerf.py", line 507, in <module>
    train(args)
  File "/home/mihnea/mihnea/GP-NeRF/train_gpnerf.py", line 200, in train
    fine_sem_out, loss_distill, loss_depth_guided_sem = model.sem_seg_head(que_deep_semantics, ret['outputs_fine']['feats_out'], selected_inds)
  File "/home/mihnea/anaconda3/envs/semray2/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/mihnea/anaconda3/envs/semray2/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/mihnea/mihnea/GP-NeRF/gpnerf/semantic_branch.py", line 61, in forward
    agg_feats_3d = agg_sem_feats['feats_out_3d']
IndexError: too many indices for tensor of dimension 2

Can you please guide me on how to choose the correct configuration and the changes to be made so that I can train the desired model?

lifuguan commented 7 months ago

Thanks for your attention. You can follow the format of ScannetDataset to load depth maps:

def __init__():
  all_depth_files = []
  for ...
    depth_files = [f.replace("pose", "depth").replace("txt", "png") for f in pose_files]
  self.all_depth_files = np.array(all_depth_files, dtype=object)[index]
def __getitem(): 
  depth_files = self.all_depth_files[real_idx]
  ....
  img = Image.open(depth_files[id_render])
  depth = np.asarray(img, dtype=np.float32) / 1000.0  # mm -> m
  depth = np.ascontiguousarray(depth, dtype=np.float32)
  depth = cv2.resize(depth, (self.w, self.h), interpolation=cv2.INTER_NEAREST)
  ....
  return {
      ....
      "true_depth": torch.from_numpy(depth),
     ....
}

Meanwhile, use single GPU to train may not unable to produce best results due to our method contains a perception head.

mbjurca commented 7 months ago

Thank you for the reply, but as I said I already added the missing elements for the Replica dataset. The problem happens in the model.sem_seg_head. ret['outputs_fine']['feats_out']is outputting a tensor of shape (400, 512) and not a dictionary as expected, yielding the error mentioned before. Also, I see that there are some depth processing going on, does that mean the model is using the depth masks to get a better prediction even if the depth_loss_scale in the config file is set to 0? This is my configuration file for the train:

### INPUT
expname = gnt_replica
rootdir = ./
render_stride = 2
distributed = False

## dataset
train_dataset = train_replica
dataset_weights = [1]
eval_dataset = val_replica
val_set_list = configs/replica_test_split.txt
original_width = 640
original_height = 480

### TRAINING
N_rand = 400
lrate_feature = 0.005
lrate_semantic = 0.005
lrate_gnt = 0.00001
lrate_decay_factor = 0.6
lrate_decay_steps = 4000
single_net = True
trans_depth = 8

### TESTING
chunk_size = 2000

### RENDERING
N_importance = 16
N_samples = 48
inv_uniform = True
white_bkgd = False

### CONSOLE AND TENSORBOARD
total_step = 16000
i_print = 100
save_interval = 2000

### SEMANTIC SETTING
save_feature = True
semantic_model = fpn

###
render_loss_scale = 0.25
semantic_loss_scale = 0.75
distill_loss_scale = 0.5
depth_loss_scale = 0

num_classes = 46
ignore_label = 47