neu-vi / PlanarRecon

Apache License 2.0
283 stars 13 forks source link

Exact train config #10

Closed nbansal90 closed 2 years ago

nbansal90 commented 2 years ago

Hey @ymingxie !

Thank you for sharing your work! As shared by you, in the README.md, you divide the training into 3 phases. where the training has been divided into epochs 20, 25, and 50.

Regards, Nitin Bansal

ymingxie commented 2 years ago

Hi Nitin,

Sure, I will create 3 different config files (maybe today or tomorrow). 499 is not the valid epoch I ran. I usually get the final checkpoint before 70 epochs.

Best, Yiming

nbansal90 commented 2 years ago

Sure Yiming! That would be great.

Meanwhile, I subdivided my config into three different parts. According to the phase specified by you. During Phase 2 training. If get the following missing key error:, which might be due to setting GRU_FUSION to True, during phase 2, which was absent during phase 1.

File "main.py", line 192, in train model.load_state_dict(state_dict['model']) File "/home/us000146/anaconda3/envs/selfsupervised/lib/python3.6/site-packages/torch/nn/modules/module.py", line 1224, in load_state_dict self.__class__.__name__, "\n\t".join(error_msgs))) RuntimeError: Error(s) in loading state_dict for DistributedDataParallel: Missing key(s) in state_dict: "module.fragment_net.gru_fusion.fusion_nets.0.convz.net.kernel", "module.fragment_net.gru_fusion.fusion_nets.0.convz.point_transforms.0.weig ht", "module.fragment_net.gru_fusion.fusion_nets.0.convz.point_transforms.0.bias", "module.fragment_net.gru_fusion.fusion_nets.0.convr.net.kernel", "module.fragment_net.gru_fusio n.fusion_nets.0.convr.point_transforms.0.weight", "module.fragment_net.gru_fusion.fusion_nets.0.convr.point_transforms.0.bias", "module.fragment_net.gru_fusion.fusion_nets.0.conv q.net.kernel", "module.fragment_net.gru_fusion.fusion_nets.0.convq.point_transforms.0.weight", "module.fragment_net.gru_fusion.fusion_nets.0.convq.point_transforms.0.bias", "modu le.fragment_net.gru_fusion.fusion_nets.1.convz.net.kernel", "module.fragment_net.gru_fusion.fusion_nets.1.convz.point_transforms.0.weight", "module.fragment_net.gru_fusion.fusion _nets.1.convz.point_transforms.0.bias", "module.fragment_net.gru_fusion.fusion_nets.1.convr.net.kernel", "module.fragment_net.gru_fusion.fusion_nets.1.convr.point_transforms.0.we ight", "module.fragment_net.gru_fusion.fusion_nets.1.convr.point_transforms.0.bias", "module.fragment_net.gru_fusion.fusion_nets.1.convq.net.kernel", "module.fragment_net.gru_fus ion.fusion_nets.1.convq.point_transforms.0.weight", "module.fragment_net.gru_fusion.fusion_nets.1.convq.point_transforms.0.bias", "module.fragment_net.gru_fusion.fusion_nets.2.co nvz.net.kernel", "module.fragment_net.gru_fusion.fusion_nets.2.convz.point_transforms.0.weight", "module.fragment_net.gru_fusion.fusion_nets.2.convz.point_transforms.0.bias", "mo dule.fragment_net.gru_fusion.fusion_nets.2.convr.net.kernel", "module.fragment_net.gru_fusion.fusion_nets.2.convr.point_transforms.0.weight", "module.fragment_net.gru_fusion.fusi on_nets.2.convr.point_transforms.0.bias", "module.fragment_net.gru_fusion.fusion_nets.2.convq.net.kernel", "module.fragment_net.gru_fusion.fusion_nets.2.convq.point_transforms.0. weight", "module.fragment_net.gru_fusion.fusion_nets.2.convq.point_transforms.0.bias".

Nitin

ymingxie commented 2 years ago

Hi Nitin,

Check here: https://github.com/neu-vi/PlanarRecon/blob/main/main.py#L182

"RESUME" is set to True by default and the "strict" is set to false when loading the checkpoint. So it should ignore the missing keys.