What is the correct order to use DistributedDataParallel and QAT Quantizer?

Describe the issue:

Environment:

NNI version: Master(3.0?)
Training service (local|remote|pai|aml|etc): local
Client OS: Arch Linux
Server OS (for remote mode only): N/A
Python version: 3.11
PyTorch/TensorFlow version: PyTorch 1.13
Is conda/virtualenv/venv used?: No
Is running in Docker?: No

Configuration:

Experiment config (remember to remove secrets!): N/A
Search space: N/A

Log message:

nnimanager.log:
dispatcher.log:
nnictl stdout and stderr:

How to reproduce it?: I'm trying to do QAT with DDP, but I'm confused with the order of initializing optimizer. According to Pytorch official code, definition of optimizer should happen after wrapping model in DDP. But in NNI, https://github.com/microsoft/nni/blob/master/nni/compression/quantization/qat_quantizer.py this example shows that we should have optimizer first, pass it into evaluator, then let QAT Quantizer wrap the model. I can't find any example code for DPP+QAT, could anyone help?

microsoft / nni

What is the correct order to use DistributedDataParallel and QAT Quantizer? #5698