mosaicml / llm-foundry

LLM training code for Databricks foundation models
https://www.databricks.com/blog/introducing-dbrx-new-state-art-open-llm
Apache License 2.0
3.84k stars 503 forks source link

Update Dockerfile with TE main #1273

Closed j316chuck closed 2 weeks ago

j316chuck commented 3 weeks ago

Update Dockerfile with TE main to resolve torch 2.3.0 build issues

Manual Tests:

Docker build:

[docker-build (2.3.0_cu121_flash2, mosaicml/pytorch:2.3.0_cu121-python3.11-ubuntu20.04, [gpu-flash2])](https://github.com/mosaicml/llm-foundry/actions/runs/9487638950/job/26144725405?pr=1273#logs)
succeeded 2 hours ago in 24m 8s
dakinggg commented 3 weeks ago

Is this an issue?

2024-06-13 20:11:17,911: rank0[1141][MainThread]: INFO: composer.utils.checkpoint: Ignoring the following paths from the loaded checkpoint state_dict: state/model/model.transformer.blocks.4.ffn.gate_proj._extra_state, state/model/model.transformer.blocks.4.norm_attn_norm.attn.Wqkv._extra_state, state/model/model.transformer.blocks.9.norm_attn_norm.attn.Wqkv._extra_state, state/model/model.transformer.blocks.28.ffn.down_proj._extra_state, state/model/model.transformer.blocks.19.norm_attn_norm.attn.out_proj._extra_state, state/model/model.transformer.blocks.28.ffn.up_proj._extra_state, state/model/model.transformer.blocks.1.ffn.down_proj._extra_state, state/model/model.transformer.blocks.25.ffn.up_proj._extra_state, state/model/model.transformer.blocks.4.norm_attn_norm.attn.out_proj._extra_state, state/model/model.transformer.blocks.12.ffn.down_proj._extra_state, state/model/model.transformer.blocks.16.norm_attn_norm.attn.Wqkv._extra_state, state/model/model.transformer.blocks.6.norm_attn_norm.attn.Wqkv._extra_state, state/model/model.transformer.blocks.29.ffn.gate_proj._extra_state, state/model/model.transformer.blocks.30.norm_attn_norm.attn.Wqkv._extra_state, state/model/model.transformer.blocks.22.norm_attn_norm.attn.Wqkv._extra_state, state/model/model.transformer.blocks.17.ffn.gate_proj._extra_state, state/model/model.transformer.blocks.3.norm_attn_norm.attn.Wqkv._extra_state, state/model/model.transformer.blocks.0.ffn.up_proj._extra_state, state/model/model.transformer.blocks.8.ffn.gate_proj._extra_state, state/model/model.transformer.blocks.16.ffn.gate_proj._extra_state, state/model/model.transformer.blocks.13.ffn.down_proj._extra_state, state/model/model.transformer.blocks.10.ffn.up_proj._extra_state, state/model/model.transformer.blocks.0.norm_attn_norm.attn.out_proj._extra_state, state/model/model.transformer.blocks.10.ffn.down_proj._extra_state, state/model/model.transformer.blocks.16.ffn.up_proj._extra_state, state/model/model.transformer.blocks.11.ffn.up_proj._extra_state, state/model/model.transformer.blocks.13.norm_attn_norm.attn.Wqkv._extra_state, state/model/model.transformer.blocks.24.norm_attn_norm.attn.Wqkv._extra_state, state/model/model.transformer.blocks.7.norm_attn_norm.attn.Wqkv._extra_state, state/model/model.transformer.blocks.23.ffn.gate_proj._extra_state, state/model/model.transformer.blocks.2.ffn.up_proj._extra_state, state/model/model.transformer.blocks.5.ffn.gate_proj._extra_state, state/model/model.transformer.blocks.3.ffn.down_proj._extra_state, state/model/model.transformer.blocks.20.norm_attn_norm.attn.out_proj._extra_state, state/model/model.transformer.blocks.31.norm_attn_norm.attn.Wqkv._extra_state, state/model/model.transformer.blocks.30.ffn.gate_proj._extra_state, state/model/model.transformer.blocks.2.norm_attn_norm.attn.Wqkv._extra_state, state/model/model.transformer.blocks.7.ffn.down_proj._extra_state, state/model/model.transformer.blocks.1.ffn.up_proj._extra_state, state/model/model.transformer.blocks.23.norm_attn_norm.attn.Wqkv._extra_state, state/model/model.transformer.blocks.15.norm_attn_norm.attn.out_proj._extra_state, state/model/model.transformer.blocks.22.ffn.down_proj._extra_state, state/model/model.transformer.blocks.2.norm_attn_norm.attn.out_proj._extra_state, state/model/model.transformer.blocks.13.norm_attn_norm.attn.out_proj._extra_state, state/model/model.transformer.blocks.25.norm_attn_norm.attn.out_proj._extra_state, state/model/model.transformer.blocks.4.ffn.up_proj._extra_state, state/model/model.transformer.blocks.13.ffn.up_proj._extra_state, state/model/model.transformer.blocks.26.ffn.up_proj._extra_state, state/model/model.transformer.blocks.27.ffn.down_proj._extra_state, state/model/model.transformer.blocks.31.ffn.down_proj._extra_state, state/model/model.transformer.blocks.16.ffn.down_proj._extra_state, state/model/model.transformer.blocks.3.norm_attn_norm.attn.out_proj._extra_state, state/model/model.transformer.blocks.3.ffn.gate_proj._extra_state, state/model/model.transformer.blocks.12.ffn.up_proj._extra_state, state/model/model.transformer.blocks.22.ffn.up_proj._extra_state, state/model/model.transformer.blocks.9.ffn.down_proj._extra_state, state/model/model.transformer.blocks.16.norm_attn_norm.attn.out_proj._extra_state, state/model/model.transformer.blocks.23.norm_attn_norm.attn.out_proj._extra_state, state/model/model.transformer.blocks.12.norm_attn_norm.attn.Wqkv._extra_state, state/model/model.transformer.blocks.20.norm_attn_norm.attn.Wqkv._extra_state, state/model/model.transformer.blocks.26.ffn.gate_proj._extra_state, state/model/model.transformer.blocks.28.norm_attn_norm.attn.Wqkv._extra_state, state/model/model.transformer.blocks.8.norm_attn_norm.attn.out_proj._extra_state, state/model/model.transformer.blocks.31.ffn.gate_proj._extra_state, state/model/model.transformer.blocks.0.ffn.down_proj._extra_state, state/model/model.transformer.blocks.9.ffn.up_proj._extra_state, state/model/model.transformer.blocks.17.ffn.up_proj._extra_state, state/model/model.transformer.blocks.5.norm_attn_norm.attn.Wqkv._extra_state, state/model/model.transformer.blocks.9.ffn.gate_proj._extra_state, state/model/model.transformer.blocks.5.ffn.down_proj._extra_state, state/model/model.transformer.blocks.18.ffn.up_proj._extra_state, state/model/model.transformer.blocks.21.norm_attn_norm.attn.out_proj._extra_state, state/model/model.transformer.blocks.21.ffn.up_proj._extra_state, state/model/model.transformer.blocks.9.norm_attn_norm.attn.out_proj._extra_state, state/model/model.transformer.blocks.20.ffn.gate_proj._extra_state, state/model/model.transformer.blocks.29.norm_attn_norm.attn.out_proj._extra_state, state/model/model.transformer.blocks.19.ffn.gate_proj._extra_state, state/model/model.transformer.blocks.6.ffn.up_proj._extra_state, state/model/model.transformer.blocks.15.ffn.down_proj._extra_state, state/model/model.transformer.blocks.15.ffn.gate_proj._extra_state, state/model/model.transformer.blocks.17.norm_attn_norm.attn.Wqkv._extra_state, state/model/model.transformer.blocks.29.ffn.up_proj._extra_state, state/model/model.transformer.blocks.10.norm_attn_norm.attn.Wqkv._extra_state, state/model/model.transformer.blocks.14.ffn.gate_proj._extra_state, state/model/model.transformer.blocks.20.ffn.up_proj._extra_state, state/model/model.transformer.blocks.15.ffn.up_proj._extra_state, state/model/model.transformer.blocks.18.ffn.down_proj._extra_state, state/model/model.transformer.blocks.21.norm_attn_norm.attn.Wqkv._extra_state, state/model/model.transformer.blocks.8.ffn.up_proj._extra_state, state/model/model.transformer.blocks.31.ffn.up_proj._extra_state, state/model/model.transformer.blocks.10.norm_attn_norm.attn.out_proj._extra_state, state/model/model.transformer.blocks.8.norm_attn_norm.attn.Wqkv._extra_state, state/model/model.transformer.blocks.12.ffn.gate_proj._extra_state, state/model/model.transformer.blocks.11.ffn.gate_proj._extra_state, state/model/model.transformer.blocks.7.ffn.gate_proj._extra_state, state/model/model.transformer.blocks.25.norm_attn_norm.attn.Wqkv._extra_state, state/model/model.transformer.blocks.29.norm_attn_norm.attn.Wqkv._extra_state, state/model/model.transformer.blocks.18.ffn.gate_proj._extra_state, state/model/model.transformer.blocks.19.ffn.up_proj._extra_state, state/model/model.transformer.blocks.30.ffn.up_proj._extra_state, state/model/model.transformer.blocks.5.ffn.up_proj._extra_state, state/model/model.transformer.blocks.17.norm_attn_norm.attn.out_proj._extra_state, state/model/model.transformer.blocks.29.ffn.down_proj._extra_state, state/model/model.transformer.blocks.7.ffn.up_proj._extra_state, state/model/model.transformer.blocks.14.norm_attn_norm.attn.Wqkv._extra_state, state/model/model.transformer.blocks.14.ffn.down_proj._extra_state, state/model/model.transformer.blocks.25.ffn.gate_proj._extra_state, state/model/model.transformer.blocks.22.ffn.gate_proj._extra_state, state/model/model.transformer.blocks.7.norm_attn_norm.attn.out_proj._extra_state, state/model/model.transformer.blocks.12.norm_attn_norm.attn.out_proj._extra_state, state/model/model.transformer.blocks.19.norm_attn_norm.attn.Wqkv._extra_state, state/model/model.transformer.blocks.21.ffn.down_proj._extra_state, state/model/model.transformer.blocks.23.ffn.up_proj._extra_state, state/model/model.transformer.blocks.14.norm_attn_norm.attn.out_proj._extra_state, state/model/model.transformer.blocks.2.ffn.down_proj._extra_state, state/model/model.transformer.blocks.28.norm_attn_norm.attn.out_proj._extra_state, state/model/model.transformer.blocks.0.norm_attn_norm.attn.Wqkv._extra_state, state/model/model.transformer.blocks.0.ffn.gate_proj._extra_state, state/model/model.transformer.blocks.11.norm_attn_norm.attn.Wqkv._extra_state, state/model/model.transformer.blocks.27.ffn.gate_proj._extra_state, state/model/model.transformer.blocks.30.norm_attn_norm.attn.out_proj._extra_state, state/model/model.transformer.blocks.1.ffn.gate_proj._extra_state, state/model/model.transformer.blocks.11.ffn.down_proj._extra_state, state/model/model.transformer.blocks.31.norm_attn_norm.attn.out_proj._extra_state, state/model/model.transformer.blocks.6.ffn.gate_proj._extra_state, state/model/model.transformer.blocks.18.norm_attn_norm.attn.Wqkv._extra_state, state/model/model.transformer.blocks.19.ffn.down_proj._extra_state, state/model/model.transformer.blocks.10.ffn.gate_proj._extra_state, state/model/model.transformer.blocks.18.norm_attn_norm.attn.out_proj._extra_state, state/model/model.transformer.blocks.27.norm_attn_norm.attn.out_proj._extra_state, state/model/model.transformer.blocks.17.ffn.down_proj._extra_state, state/model/model.transformer.blocks.24.norm_attn_norm.attn.out_proj._extra_state, state/model/model.transformer.blocks.26.ffn.down_proj._extra_state, state/model/model.transformer.blocks.3.ffn.up_proj._extra_state, state/model/model.transformer.blocks.25.ffn.down_proj._extra_state, state/model/model.transformer.blocks.6.ffn.down_proj._extra_state, state/model/model.transformer.blocks.2.ffn.gate_proj._extra_state, state/model/model.transformer.blocks.24.ffn.gate_proj._extra_state, state/model/model.transformer.blocks.8.ffn.down_proj._extra_state, state/model/model.transformer.blocks.1.norm_attn_norm.attn.out_proj._extra_state, state/model/model.transformer.blocks.20.ffn.down_proj._extra_state, state/model/model.transformer.blocks.1.norm_attn_norm.attn.Wqkv._extra_state, state/model/model.transformer.blocks.24.ffn.up_proj._extra_state, state/model/model.transformer.blocks.30.ffn.down_proj._extra_state, state/model/model.transformer.blocks.13.ffn.gate_proj._extra_state, state/model/model.transformer.blocks.5.norm_attn_norm.attn.out_proj._extra_state, state/model/model.transformer.blocks.4.ffn.down_proj._extra_state, state/model/model.transformer.blocks.24.ffn.down_proj._extra_state, state/model/model.transformer.blocks.22.norm_attn_norm.attn.out_proj._extra_state, state/model/model.transformer.blocks.15.norm_attn_norm.attn.Wqkv._extra_state, state/model/model.transformer.blocks.26.norm_attn_norm.attn.out_proj._extra_state, state/model/model.transformer.blocks.27.ffn.up_proj._extra_state, state/model/model.transformer.blocks.11.norm_attn_norm.attn.out_proj._extra_state, state/model/model.transformer.blocks.23.ffn.down_proj._extra_state, state/model/model.transformer.blocks.26.norm_attn_norm.attn.Wqkv._extra_state, state/model/model.transformer.blocks.27.norm_attn_norm.attn.Wqkv._extra_state, state/model/model.transformer.blocks.14.ffn.up_proj._extra_state, state/model/model.transformer.blocks.6.norm_attn_norm.attn.out_proj._extra_state, state/model/model.transformer.blocks.21.ffn.gate_proj._extra_state, state/model/model.transformer.blocks.28.ffn.gate_proj._extra_state
j316chuck commented 3 weeks ago

Nope, we ignore the FP8 extra_state upon load time with ignore_keys because it doesn't exist in the base model. It will get populated on the first batch in training.