Warning messages while trying to use ZeroRedundancyOptimizer

sandeepnmenon commented 3 years ago

I am training the SPVCNN model built using torchsparse and trained using torchpack wrapper. While trying to use ZeroRedundancyOptimizer as follows

optimizer = ZeroRedundancyOptimizer(params=model.parameters(), optim=torch.optim.SGD,
                                                lr=configs.optimizer.lr,
                                                momentum=configs.optimizer.momentum,
                                                weight_decay=configs.optimizer.weight_decay,
                                                nesterov=configs.optimizer.nesterov)

and running training using the command torchpack dist-run -np 1 python train.py configs/semantic_kitti/spvcnn/cr0p5.yaml --run-dir runs/test

I see the following warnings right before the checkpoints are saved WARNING:root:Optimizer state has not been consolidated. Returning the local state WARNING:root:Please call consolidate_state_dict() beforehand if you meant to save the global state

GCC: 8.4.0
NVCC: 10.2.89
PyTorch: 1.8.1+cu102
PyTorch CUDA: 10.2
TorchSparse: 1.2.0
Torchpack: 0.3.0

zhijian-liu commented 3 years ago

I think this issue is not directly related to TorchPack. You might be able to fix this issue by adding self.optimizer.consolidate_state_dict() at the beginning of the _state_dict() function inside the trainer.

sandeepnmenon commented 3 years ago

Thank you @zhijian-liu . That worked.

zhijian-liu / torchpack

Warning messages while trying to use ZeroRedundancyOptimizer #18