Previously, we were accidentally setting the ffn_config to the ffn_config_defaults dictionary directly. This meant that the modification of self.ffn_config here (https://github.com/mosaicml/llm-foundry/blob/994209cf646c38ade6719d86402730489409e104/llmfoundry/models/mpt/configuration_mpt.py#L316) could potentially alter the defaults dictionary. This led to unexpected results if the constructor is called multiple times (which it is, during the HF construction process, internally to huggingface code). The result was the fc_type was always added to the ffn_config. With the recent overhaul of the config system, we altered this behavior by calling the constructor directly instead of from_dict, removing the extra construction call that was modifying the defaults.
This PR therefore does two things:
(1) Passes fc_type along to all ffn construction functions
(2) fixes the dangerous dictionary default setting in the MPT config class
Previously, we were accidentally setting the
ffn_config
to theffn_config_defaults
dictionary directly. This meant that the modification ofself.ffn_config
here (https://github.com/mosaicml/llm-foundry/blob/994209cf646c38ade6719d86402730489409e104/llmfoundry/models/mpt/configuration_mpt.py#L316) could potentially alter the defaults dictionary. This led to unexpected results if the constructor is called multiple times (which it is, during the HF construction process, internally to huggingface code). The result was thefc_type
was always added to theffn_config
. With the recent overhaul of the config system, we altered this behavior by calling the constructor directly instead offrom_dict
, removing the extra construction call that was modifying the defaults.This PR therefore does two things: (1) Passes
fc_type
along to all ffn construction functions (2) fixes the dangerous dictionary default setting in the MPT config class