This PR introduces a feature that automatically infers the mappings argument for the SmoothQuantModifier based on the model architecture, eliminating the need for manual specification of layer mappings.
Before:
In the prior implementation, users had to manually define layer mappings, as shown below:
Auto-inference of mappings: The SmoothQuantModifier now automatically detects and applies appropriate layer mappings based on the model's architecture, making the modifier more user-friendly and reducing the risk of manual configuration errors.
Manual mappings parameter removal: The mappings parameter is no longer required in the configuration, as it is inferred dynamically.
Backward Compatibility: Existing configurations that manually specify mappings will still be supported, ensuring smooth transition and compatibility with older setups.
Motivation:
These changes improve usability by automating configuration setup and reducing user overhead, as outlined in the design document: Link to Design Doc. This also ensures that the quantization recipes are adaptable to various model architectures without manual intervention.
The autoinference of mappings were tested using a Mixtral model: Isotonic/TinyMixtral-4x248M-MoE
Description:
This PR introduces a feature that automatically infers the
mappings
argument for theSmoothQuantModifier
based on the model architecture, eliminating the need for manual specification of layer mappings.Before:
In the prior implementation, users had to manually define layer mappings, as shown below:
Now:
With this update, the
SmoothQuantModifier
automatically infers the mappings based on the architecture, simplifying the configuration:Key Changes:
mappings
: TheSmoothQuantModifier
now automatically detects and applies appropriate layer mappings based on the model's architecture, making the modifier more user-friendly and reducing the risk of manual configuration errors.mappings
parameter removal: Themappings
parameter is no longer required in the configuration, as it is inferred dynamically.mappings
will still be supported, ensuring smooth transition and compatibility with older setups.Motivation:
These changes improve usability by automating configuration setup and reducing user overhead, as outlined in the design document: Link to Design Doc. This also ensures that the quantization recipes are adaptable to various model architectures without manual intervention.
The autoinference of mappings were tested using a Mixtral model:
Isotonic/TinyMixtral-4x248M-MoE