mosaicml / llm-foundry

LLM training code for Databricks foundation models
https://www.databricks.com/blog/introducing-dbrx-new-state-art-open-llm
Apache License 2.0
3.99k stars 525 forks source link

Pass FC type along for all FFN types #1196

Closed dakinggg closed 4 months ago

dakinggg commented 4 months ago

Previously, we were accidentally setting the ffn_config to the ffn_config_defaults dictionary directly. This meant that the modification of self.ffn_config here (https://github.com/mosaicml/llm-foundry/blob/994209cf646c38ade6719d86402730489409e104/llmfoundry/models/mpt/configuration_mpt.py#L316) could potentially alter the defaults dictionary. This led to unexpected results if the constructor is called multiple times (which it is, during the HF construction process, internally to huggingface code). The result was the fc_type was always added to the ffn_config. With the recent overhaul of the config system, we altered this behavior by calling the constructor directly instead of from_dict, removing the extra construction call that was modifying the defaults.

This PR therefore does two things: (1) Passes fc_type along to all ffn construction functions (2) fixes the dangerous dictionary default setting in the MPT config class