zhijian-liu / torchpack

A neural network training interface based on PyTorch, with a focus on flexibility
https://pypi.org/project/torchpack/
MIT License
61 stars 15 forks source link

Warning messages while trying to use ZeroRedundancyOptimizer #18

Closed sandeepnmenon closed 3 years ago

sandeepnmenon commented 3 years ago

I am training the SPVCNN model built using torchsparse and trained using torchpack wrapper. While trying to use ZeroRedundancyOptimizer as follows

optimizer = ZeroRedundancyOptimizer(params=model.parameters(), optim=torch.optim.SGD,
                                                lr=configs.optimizer.lr,
                                                momentum=configs.optimizer.momentum,
                                                weight_decay=configs.optimizer.weight_decay,
                                                nesterov=configs.optimizer.nesterov)

and running training using the command torchpack dist-run -np 1 python train.py configs/semantic_kitti/spvcnn/cr0p5.yaml --run-dir runs/test

I see the following warnings right before the checkpoints are saved WARNING:root:Optimizer state has not been consolidated. Returning the local state WARNING:root:Please call consolidate_state_dict() beforehand if you meant to save the global state

zhijian-liu commented 3 years ago

I think this issue is not directly related to TorchPack. You might be able to fix this issue by adding self.optimizer.consolidate_state_dict() at the beginning of the _state_dict() function inside the trainer.

sandeepnmenon commented 3 years ago

Thank you @zhijian-liu . That worked.