Closed j316chuck closed 2 weeks ago
Is this an issue?
2024-06-13 20:11:17,911: rank0[1141][MainThread]: INFO: composer.utils.checkpoint: Ignoring the following paths from the loaded checkpoint state_dict: state/model/model.transformer.blocks.4.ffn.gate_proj._extra_state, state/model/model.transformer.blocks.4.norm_attn_norm.attn.Wqkv._extra_state, state/model/model.transformer.blocks.9.norm_attn_norm.attn.Wqkv._extra_state, state/model/model.transformer.blocks.28.ffn.down_proj._extra_state, state/model/model.transformer.blocks.19.norm_attn_norm.attn.out_proj._extra_state, state/model/model.transformer.blocks.28.ffn.up_proj._extra_state, state/model/model.transformer.blocks.1.ffn.down_proj._extra_state, state/model/model.transformer.blocks.25.ffn.up_proj._extra_state, state/model/model.transformer.blocks.4.norm_attn_norm.attn.out_proj._extra_state, state/model/model.transformer.blocks.12.ffn.down_proj._extra_state, state/model/model.transformer.blocks.16.norm_attn_norm.attn.Wqkv._extra_state, state/model/model.transformer.blocks.6.norm_attn_norm.attn.Wqkv._extra_state, state/model/model.transformer.blocks.29.ffn.gate_proj._extra_state, state/model/model.transformer.blocks.30.norm_attn_norm.attn.Wqkv._extra_state, state/model/model.transformer.blocks.22.norm_attn_norm.attn.Wqkv._extra_state, state/model/model.transformer.blocks.17.ffn.gate_proj._extra_state, state/model/model.transformer.blocks.3.norm_attn_norm.attn.Wqkv._extra_state, state/model/model.transformer.blocks.0.ffn.up_proj._extra_state, state/model/model.transformer.blocks.8.ffn.gate_proj._extra_state, state/model/model.transformer.blocks.16.ffn.gate_proj._extra_state, state/model/model.transformer.blocks.13.ffn.down_proj._extra_state, state/model/model.transformer.blocks.10.ffn.up_proj._extra_state, state/model/model.transformer.blocks.0.norm_attn_norm.attn.out_proj._extra_state, state/model/model.transformer.blocks.10.ffn.down_proj._extra_state, state/model/model.transformer.blocks.16.ffn.up_proj._extra_state, state/model/model.transformer.blocks.11.ffn.up_proj._extra_state, state/model/model.transformer.blocks.13.norm_attn_norm.attn.Wqkv._extra_state, state/model/model.transformer.blocks.24.norm_attn_norm.attn.Wqkv._extra_state, state/model/model.transformer.blocks.7.norm_attn_norm.attn.Wqkv._extra_state, state/model/model.transformer.blocks.23.ffn.gate_proj._extra_state, state/model/model.transformer.blocks.2.ffn.up_proj._extra_state, state/model/model.transformer.blocks.5.ffn.gate_proj._extra_state, state/model/model.transformer.blocks.3.ffn.down_proj._extra_state, state/model/model.transformer.blocks.20.norm_attn_norm.attn.out_proj._extra_state, state/model/model.transformer.blocks.31.norm_attn_norm.attn.Wqkv._extra_state, state/model/model.transformer.blocks.30.ffn.gate_proj._extra_state, state/model/model.transformer.blocks.2.norm_attn_norm.attn.Wqkv._extra_state, state/model/model.transformer.blocks.7.ffn.down_proj._extra_state, state/model/model.transformer.blocks.1.ffn.up_proj._extra_state, state/model/model.transformer.blocks.23.norm_attn_norm.attn.Wqkv._extra_state, state/model/model.transformer.blocks.15.norm_attn_norm.attn.out_proj._extra_state, state/model/model.transformer.blocks.22.ffn.down_proj._extra_state, state/model/model.transformer.blocks.2.norm_attn_norm.attn.out_proj._extra_state, state/model/model.transformer.blocks.13.norm_attn_norm.attn.out_proj._extra_state, state/model/model.transformer.blocks.25.norm_attn_norm.attn.out_proj._extra_state, state/model/model.transformer.blocks.4.ffn.up_proj._extra_state, state/model/model.transformer.blocks.13.ffn.up_proj._extra_state, state/model/model.transformer.blocks.26.ffn.up_proj._extra_state, state/model/model.transformer.blocks.27.ffn.down_proj._extra_state, state/model/model.transformer.blocks.31.ffn.down_proj._extra_state, state/model/model.transformer.blocks.16.ffn.down_proj._extra_state, state/model/model.transformer.blocks.3.norm_attn_norm.attn.out_proj._extra_state, state/model/model.transformer.blocks.3.ffn.gate_proj._extra_state, state/model/model.transformer.blocks.12.ffn.up_proj._extra_state, state/model/model.transformer.blocks.22.ffn.up_proj._extra_state, state/model/model.transformer.blocks.9.ffn.down_proj._extra_state, state/model/model.transformer.blocks.16.norm_attn_norm.attn.out_proj._extra_state, state/model/model.transformer.blocks.23.norm_attn_norm.attn.out_proj._extra_state, state/model/model.transformer.blocks.12.norm_attn_norm.attn.Wqkv._extra_state, state/model/model.transformer.blocks.20.norm_attn_norm.attn.Wqkv._extra_state, state/model/model.transformer.blocks.26.ffn.gate_proj._extra_state, state/model/model.transformer.blocks.28.norm_attn_norm.attn.Wqkv._extra_state, state/model/model.transformer.blocks.8.norm_attn_norm.attn.out_proj._extra_state, state/model/model.transformer.blocks.31.ffn.gate_proj._extra_state, state/model/model.transformer.blocks.0.ffn.down_proj._extra_state, state/model/model.transformer.blocks.9.ffn.up_proj._extra_state, state/model/model.transformer.blocks.17.ffn.up_proj._extra_state, state/model/model.transformer.blocks.5.norm_attn_norm.attn.Wqkv._extra_state, state/model/model.transformer.blocks.9.ffn.gate_proj._extra_state, state/model/model.transformer.blocks.5.ffn.down_proj._extra_state, state/model/model.transformer.blocks.18.ffn.up_proj._extra_state, state/model/model.transformer.blocks.21.norm_attn_norm.attn.out_proj._extra_state, state/model/model.transformer.blocks.21.ffn.up_proj._extra_state, state/model/model.transformer.blocks.9.norm_attn_norm.attn.out_proj._extra_state, state/model/model.transformer.blocks.20.ffn.gate_proj._extra_state, state/model/model.transformer.blocks.29.norm_attn_norm.attn.out_proj._extra_state, state/model/model.transformer.blocks.19.ffn.gate_proj._extra_state, state/model/model.transformer.blocks.6.ffn.up_proj._extra_state, state/model/model.transformer.blocks.15.ffn.down_proj._extra_state, state/model/model.transformer.blocks.15.ffn.gate_proj._extra_state, state/model/model.transformer.blocks.17.norm_attn_norm.attn.Wqkv._extra_state, state/model/model.transformer.blocks.29.ffn.up_proj._extra_state, state/model/model.transformer.blocks.10.norm_attn_norm.attn.Wqkv._extra_state, state/model/model.transformer.blocks.14.ffn.gate_proj._extra_state, state/model/model.transformer.blocks.20.ffn.up_proj._extra_state, state/model/model.transformer.blocks.15.ffn.up_proj._extra_state, state/model/model.transformer.blocks.18.ffn.down_proj._extra_state, state/model/model.transformer.blocks.21.norm_attn_norm.attn.Wqkv._extra_state, state/model/model.transformer.blocks.8.ffn.up_proj._extra_state, state/model/model.transformer.blocks.31.ffn.up_proj._extra_state, state/model/model.transformer.blocks.10.norm_attn_norm.attn.out_proj._extra_state, state/model/model.transformer.blocks.8.norm_attn_norm.attn.Wqkv._extra_state, state/model/model.transformer.blocks.12.ffn.gate_proj._extra_state, state/model/model.transformer.blocks.11.ffn.gate_proj._extra_state, state/model/model.transformer.blocks.7.ffn.gate_proj._extra_state, state/model/model.transformer.blocks.25.norm_attn_norm.attn.Wqkv._extra_state, state/model/model.transformer.blocks.29.norm_attn_norm.attn.Wqkv._extra_state, state/model/model.transformer.blocks.18.ffn.gate_proj._extra_state, state/model/model.transformer.blocks.19.ffn.up_proj._extra_state, state/model/model.transformer.blocks.30.ffn.up_proj._extra_state, state/model/model.transformer.blocks.5.ffn.up_proj._extra_state, state/model/model.transformer.blocks.17.norm_attn_norm.attn.out_proj._extra_state, state/model/model.transformer.blocks.29.ffn.down_proj._extra_state, state/model/model.transformer.blocks.7.ffn.up_proj._extra_state, state/model/model.transformer.blocks.14.norm_attn_norm.attn.Wqkv._extra_state, state/model/model.transformer.blocks.14.ffn.down_proj._extra_state, state/model/model.transformer.blocks.25.ffn.gate_proj._extra_state, state/model/model.transformer.blocks.22.ffn.gate_proj._extra_state, state/model/model.transformer.blocks.7.norm_attn_norm.attn.out_proj._extra_state, state/model/model.transformer.blocks.12.norm_attn_norm.attn.out_proj._extra_state, state/model/model.transformer.blocks.19.norm_attn_norm.attn.Wqkv._extra_state, state/model/model.transformer.blocks.21.ffn.down_proj._extra_state, state/model/model.transformer.blocks.23.ffn.up_proj._extra_state, state/model/model.transformer.blocks.14.norm_attn_norm.attn.out_proj._extra_state, state/model/model.transformer.blocks.2.ffn.down_proj._extra_state, state/model/model.transformer.blocks.28.norm_attn_norm.attn.out_proj._extra_state, state/model/model.transformer.blocks.0.norm_attn_norm.attn.Wqkv._extra_state, state/model/model.transformer.blocks.0.ffn.gate_proj._extra_state, state/model/model.transformer.blocks.11.norm_attn_norm.attn.Wqkv._extra_state, state/model/model.transformer.blocks.27.ffn.gate_proj._extra_state, state/model/model.transformer.blocks.30.norm_attn_norm.attn.out_proj._extra_state, state/model/model.transformer.blocks.1.ffn.gate_proj._extra_state, state/model/model.transformer.blocks.11.ffn.down_proj._extra_state, state/model/model.transformer.blocks.31.norm_attn_norm.attn.out_proj._extra_state, state/model/model.transformer.blocks.6.ffn.gate_proj._extra_state, state/model/model.transformer.blocks.18.norm_attn_norm.attn.Wqkv._extra_state, state/model/model.transformer.blocks.19.ffn.down_proj._extra_state, state/model/model.transformer.blocks.10.ffn.gate_proj._extra_state, state/model/model.transformer.blocks.18.norm_attn_norm.attn.out_proj._extra_state, state/model/model.transformer.blocks.27.norm_attn_norm.attn.out_proj._extra_state, state/model/model.transformer.blocks.17.ffn.down_proj._extra_state, state/model/model.transformer.blocks.24.norm_attn_norm.attn.out_proj._extra_state, state/model/model.transformer.blocks.26.ffn.down_proj._extra_state, state/model/model.transformer.blocks.3.ffn.up_proj._extra_state, state/model/model.transformer.blocks.25.ffn.down_proj._extra_state, state/model/model.transformer.blocks.6.ffn.down_proj._extra_state, state/model/model.transformer.blocks.2.ffn.gate_proj._extra_state, state/model/model.transformer.blocks.24.ffn.gate_proj._extra_state, state/model/model.transformer.blocks.8.ffn.down_proj._extra_state, state/model/model.transformer.blocks.1.norm_attn_norm.attn.out_proj._extra_state, state/model/model.transformer.blocks.20.ffn.down_proj._extra_state, state/model/model.transformer.blocks.1.norm_attn_norm.attn.Wqkv._extra_state, state/model/model.transformer.blocks.24.ffn.up_proj._extra_state, state/model/model.transformer.blocks.30.ffn.down_proj._extra_state, state/model/model.transformer.blocks.13.ffn.gate_proj._extra_state, state/model/model.transformer.blocks.5.norm_attn_norm.attn.out_proj._extra_state, state/model/model.transformer.blocks.4.ffn.down_proj._extra_state, state/model/model.transformer.blocks.24.ffn.down_proj._extra_state, state/model/model.transformer.blocks.22.norm_attn_norm.attn.out_proj._extra_state, state/model/model.transformer.blocks.15.norm_attn_norm.attn.Wqkv._extra_state, state/model/model.transformer.blocks.26.norm_attn_norm.attn.out_proj._extra_state, state/model/model.transformer.blocks.27.ffn.up_proj._extra_state, state/model/model.transformer.blocks.11.norm_attn_norm.attn.out_proj._extra_state, state/model/model.transformer.blocks.23.ffn.down_proj._extra_state, state/model/model.transformer.blocks.26.norm_attn_norm.attn.Wqkv._extra_state, state/model/model.transformer.blocks.27.norm_attn_norm.attn.Wqkv._extra_state, state/model/model.transformer.blocks.14.ffn.up_proj._extra_state, state/model/model.transformer.blocks.6.norm_attn_norm.attn.out_proj._extra_state, state/model/model.transformer.blocks.21.ffn.gate_proj._extra_state, state/model/model.transformer.blocks.28.ffn.gate_proj._extra_state
Nope, we ignore the FP8 extra_state upon load time with ignore_keys
because it doesn't exist in the base model. It will get populated on the first batch in training.
Update Dockerfile with TE main to resolve torch 2.3.0 build issues
Manual Tests:
bs-12-weight-shard-fp8-llama3-8b-metamath-4ep-ZaOqHP
✅torch-231-bs-12-weight-shard-fp8-llama3-8b-metamath-4ep-lS8jLE
✅Docker build: