Open zhangvia opened 12 months ago
Some parameters are not correctly reduced. Can I see the list of parameters that follow Reduction failed at followed parameters:
?
Some parameters are not correctly reduced. Can I see the list of parameters that follow
Reduction failed at followed parameters:
?
RuntimeError: ("ZERO DDP error: the synchronization of gradients doesn't exit properly.", 'The most possible reason is that the model is not compatible with GeminiDDP.\n', 'Reduction failed at followed parameters: \n\tinput_blocks.1.0.cond_emb_layers.1.weight\n\tinput_blocks.1.0.cond_emb_layers.1.bias\n\tinput_blocks.2.0.cond_emb_layers.1.weight\n\tinput_blocks.2.0.cond_emb_layers.1.bias\n\tinput_blocks.3.0.cond_emb_layers.1.weight\n\tinput_blocks.3.0.cond_emb_layers.1.bias\n\tinput_blocks.4.0.cond_emb_layers.1.weight\n\tinput_blocks.4.0.cond_emb_layers.1.bias\n\tinput_blocks.5.0.cond_emb_layers.1.weight\n\tinput_blocks.5.0.cond_emb_layers.1.bias\n\tinput_blocks.6.0.cond_emb_layers.1.weight\n\tinput_blocks.6.0.cond_emb_layers.1.bias\n\tinput_blocks.7.0.cond_emb_layers.1.weight\n\tinput_blocks.7.0.cond_emb_layers.1.bias\n\tinput_blocks.8.0.cond_emb_layers.1.weight\n\tinput_blocks.8.0.cond_emb_layers.1.bias\n\tinput_blocks.8.1.qkv.weight\n\tinput_blocks.8.1.qkv.bias\n\tinput_blocks.8.1.proj_out1.weight\n\tinput_blocks.8.1.proj_out1.bias\n\tinput_blocks.9.0.cond_emb_layers.1.weight\n\tinput_blocks.9.0.cond_emb_layers.1.bias\n\tinput_blocks.10.0.cond_emb_layers.1.weight\n\tinput_blocks.10.0.cond_emb_layers.1.bias\n\tinput_blocks.11.0.cond_emb_layers.1.weight\n\tinput_blocks.11.0.cond_emb_layers.1.bias\n\tinput_blocks.11.1.qkv.weight\n\tinput_blocks.11.1.qkv.bias\n\tinput_blocks.11.1.proj_out1.weight\n\tinput_blocks.11.1.proj_out1.bias\n\tinput_blocks.12.0.cond_emb_layers.1.weight\n\tinput_blocks.12.0.cond_emb_layers.1.bias\n\tinput_blocks.13.0.cond_emb_layers.1.weight\n\tinput_blocks.13.0.cond_emb_layers.1.bias\n\tinput_blocks.14.0.cond_emb_layers.1.weight\n\tinput_blocks.14.0.cond_emb_layers.1.bias\n\tinput_blocks.14.1.qkv.weight\n\tinput_blocks.14.1.qkv.bias\n\tinput_blocks.14.1.proj_out1.weight\n\tinput_blocks.14.1.proj_out1.bias\n\tinput_blocks.15.0.cond_emb_layers.1.weight\n\tinput_blocks.15.0.cond_emb_layers.1.bias\n\tinput_blocks.16.0.cond_emb_layers.1.weight\n\tinput_blocks.16.0.cond_emb_layers.1.bias\n\tinput_blocks.17.0.cond_emb_layers.1.weight\n\tinput_blocks.17.0.cond_emb_layers.1.bias\n\tinput_blocks.17.1.qkv.weight\n\tinput_blocks.17.1.qkv.bias\n\tinput_blocks.17.1.proj_out1.weight\n\tinput_blocks.17.1.proj_out1.bias\n\tmiddle_block.0.cond_emb_layers.1.weight\n\tmiddle_block.0.cond_emb_layers.1.bias\n\tmiddle_block.1.qkv.weight\n\tmiddle_block.1.qkv.bias\n\tmiddle_block.1.proj_out1.weight\n\tmiddle_block.1.proj_out1.bias\n\tmiddle_block.2.cond_emb_layers.1.weight\n\tmiddle_block.2.cond_emb_layers.1.bias\n\toutput_blocks.0.0.cond_emb_layers.1.weight\n\toutput_blocks.0.0.cond_emb_layers.1.bias\n\toutput_blocks.1.0.cond_emb_layers.1.weight\n\toutput_blocks.1.0.cond_emb_layers.1.bias\n\toutput_blocks.1.1.qkv.weight\n\toutput_blocks.1.1.qkv.bias\n\toutput_blocks.1.1.proj_out1.weight\n\toutput_blocks.1.1.proj_out1.bias\n\toutput_blocks.2.0.cond_emb_layers.1.weight\n\toutput_blocks.2.0.cond_emb_layers.1.bias\n\toutput_blocks.2.1.cond_emb_layers.1.weight\n\toutput_blocks.2.1.cond_emb_layers.1.bias\n\toutput_blocks.3.0.cond_emb_layers.1.weight\n\toutput_blocks.3.0.cond_emb_layers.1.bias\n\toutput_blocks.4.0.cond_emb_layers.1.weight\n\toutput_blocks.4.0.cond_emb_layers.1.bias\n\toutput_blocks.4.1.qkv.weight\n\toutput_blocks.4.1.qkv.bias\n\toutput_blocks.4.1.proj_out1.weight\n\toutput_blocks.4.1.proj_out1.bias\n\toutput_blocks.5.0.cond_emb_layers.1.weight\n\toutput_blocks.5.0.cond_emb_layers.1.bias\n\toutput_blocks.5.1.cond_emb_layers.1.weight\n\toutput_blocks.5.1.cond_emb_layers.1.bias\n\toutput_blocks.6.0.cond_emb_layers.1.weight\n\toutput_blocks.6.0.cond_emb_layers.1.bias\n\toutput_blocks.7.0.cond_emb_layers.1.weight\n\toutput_blocks.7.0.cond_emb_layers.1.bias\n\toutput_blocks.7.1.qkv.weight\n\toutput_blocks.7.1.qkv.bias\n\toutput_blocks.7.1.proj_out1.weight\n\toutput_blocks.7.1.proj_out1.bias\n\toutput_blocks.8.0.cond_emb_layers.1.weight\n\toutput_blocks.8.0.cond_emb_layers.1.bias\n\toutput_blocks.8.1.cond_emb_layers.1.weight\n\toutput_blocks.8.1.cond_emb_layers.1.bias\n\toutput_blocks.9.0.cond_emb_layers.1.weight\n\toutput_blocks.9.0.cond_emb_layers.1.bias\n\toutput_blocks.10.0.cond_emb_layers.1.weight\n\toutput_blocks.10.0.cond_emb_layers.1.bias\n\toutput_blocks.10.1.qkv.weight\n\toutput_blocks.10.1.qkv.bias\n\toutput_blocks.10.1.proj_out1.weight\n\toutput_blocks.10.1.proj_out1.bias\n\toutput_blocks.11.0.cond_emb_layers.1.weight\n\toutput_blocks.11.0.cond_emb_layers.1.bias\n\toutput_blocks.11.1.cond_emb_layers.1.weight\n\toutput_blocks.11.1.cond_emb_layers.1.bias\n\toutput_blocks.12.0.cond_emb_layers.1.weight\n\toutput_blocks.12.0.cond_emb_layers.1.bias\n\toutput_blocks.13.0.cond_emb_layers.1.weight\n\toutput_blocks.13.0.cond_emb_layers.1.bias\n\toutput_blocks.14.0.cond_emb_layers.1.weight\n\toutput_blocks.14.0.cond_emb_layers.1.bias\n\toutput_blocks.14.1.cond_emb_layers.1.weight\n\toutput_blocks.14.1.cond_emb_layers.1.bias\n\toutput_blocks.15.0.cond_emb_layers.1.weight\n\toutput_blocks.15.0.cond_emb_layers.1.bias\n\toutput_blocks.16.0.cond_emb_layers.1.weight\n\toutput_blocks.16.0.cond_emb_layers.1.bias\n\toutput_blocks.17.0.cond_emb_layers.1.weight\n\toutput_blocks.17.0.cond_emb_layers.1.bias\n\tencoder.time_embed.0.weight\n\tencoder.time_embed.0.bias\n\tencoder.time_embed.2.weight\n\tencoder.time_embed.2.bias\n\tencoder.input_blocks.1.0.emb_layers.1.weight\n\tencoder.input_blocks.1.0.emb_layers.1.bias\n\tencoder.input_blocks.1.0.cond_emb_layers.1.weight\n\tencoder.input_blocks.1.0.cond_emb_layers.1.bias\n\tencoder.input_blocks.2.0.emb_layers.1.weight\n\tencoder.input_blocks.2.0.emb_layers.1.bias\n\tencoder.input_blocks.2.0.cond_emb_layers.1.weight\n\tencoder.input_blocks.2.0.cond_emb_layers.1.bias\n\tencoder.input_blocks.3.0.emb_layers.1.weight\n\tencoder.input_blocks.3.0.emb_layers.1.bias\n\tencoder.input_blocks.3.0.cond_emb_layers.1.weight\n\tencoder.input_blocks.3.0.cond_emb_layers.1.bias\n\tencoder.input_blocks.4.0.emb_layers.1.weight\n\tencoder.input_blocks.4.0.emb_layers.1.bias\n\tencoder.input_blocks.4.0.cond_emb_layers.1.weight\n\tencoder.input_blocks.4.0.cond_emb_layers.1.bias\n\tencoder.input_blocks.5.0.emb_layers.1.weight\n\tencoder.input_blocks.5.0.emb_layers.1.bias\n\tencoder.input_blocks.5.0.cond_emb_layers.1.weight\n\tencoder.input_blocks.5.0.cond_emb_layers.1.bias\n\tencoder.input_blocks.6.0.emb_layers.1.weight\n\tencoder.input_blocks.6.0.emb_layers.1.bias\n\tencoder.input_blocks.6.0.cond_emb_layers.1.weight\n\tencoder.input_blocks.6.0.cond_emb_layers.1.bias\n\tencoder.input_blocks.7.0.emb_layers.1.weight\n\tencoder.input_blocks.7.0.emb_layers.1.bias\n\tencoder.input_blocks.7.0.cond_emb_layers.1.weight\n\tencoder.input_blocks.7.0.cond_emb_layers.1.bias\n\tencoder.input_blocks.8.0.emb_layers.1.weight\n\tencoder.input_blocks.8.0.emb_layers.1.bias\n\tencoder.input_blocks.8.0.cond_emb_layers.1.weight\n\tencoder.input_blocks.8.0.cond_emb_layers.1.bias\n\tencoder.input_blocks.9.0.emb_layers.1.weight\n\tencoder.input_blocks.9.0.emb_layers.1.bias\n\tencoder.input_blocks.9.0.cond_emb_layers.1.weight\n\tencoder.input_blocks.9.0.cond_emb_layers.1.bias\n\tencoder.input_blocks.10.0.emb_layers.1.weight\n\tencoder.input_blocks.10.0.emb_layers.1.bias\n\tencoder.input_blocks.10.0.cond_emb_layers.1.weight\n\tencoder.input_blocks.10.0.cond_emb_layers.1.bias\n\tencoder.input_blocks.11.0.emb_layers.1.weight\n\tencoder.input_blocks.11.0.emb_layers.1.bias\n\tencoder.input_blocks.11.0.cond_emb_layers.1.weight\n\tencoder.input_blocks.11.0.cond_emb_layers.1.bias\n\tencoder.input_blocks.12.0.emb_layers.1.weight\n\tencoder.input_blocks.12.0.emb_layers.1.bias\n\tencoder.input_blocks.12.0.cond_emb_layers.1.weight\n\tencoder.input_blocks.12.0.cond_emb_layers.1.bias\n\tencoder.input_blocks.13.0.emb_layers.1.weight\n\tencoder.input_blocks.13.0.emb_layers.1.bias\n\tencoder.input_blocks.13.0.cond_emb_layers.1.weight\n\tencoder.input_blocks.13.0.cond_emb_layers.1.bias\n\tencoder.input_blocks.14.0.emb_layers.1.weight\n\tencoder.input_blocks.14.0.emb_layers.1.bias\n\tencoder.input_blocks.14.0.cond_emb_layers.1.weight\n\tencoder.input_blocks.14.0.cond_emb_layers.1.bias\n\tencoder.input_blocks.15.0.emb_layers.1.weight\n\tencoder.input_blocks.15.0.emb_layers.1.bias\n\tencoder.input_blocks.15.0.cond_emb_layers.1.weight\n\tencoder.input_blocks.15.0.cond_emb_layers.1.bias\n\tencoder.input_blocks.16.0.emb_layers.1.weight\n\tencoder.input_blocks.16.0.emb_layers.1.bias\n\tencoder.input_blocks.16.0.cond_emb_layers.1.weight\n\tencoder.input_blocks.16.0.cond_emb_layers.1.bias\n\tencoder.input_blocks.17.0.emb_layers.1.weight\n\tencoder.input_blocks.17.0.emb_layers.1.bias\n\tencoder.input_blocks.17.0.cond_emb_layers.1.weight\n\tencoder.input_blocks.17.0.cond_emb_layers.1.bias')
🐛 Describe the bug
when i use booster api and gemini plugin to train the PIDM, this error happens:
i'm using this repo PIDM i use the colossalai booster api in train.py my train.py is:
you can use this code replace the train.py in PIDM repo. and besides, you may need to change all variables to half,not float. and you will get the same error that i have
Environment
No response