Auto-Infer `mappings` Argument for `SmoothQuantModifier` Based on Model Architecture

Description:

This PR introduces a feature that automatically infers the mappings argument for the SmoothQuantModifier based on the model architecture, eliminating the need for manual specification of layer mappings.

Before:

In the prior implementation, users had to manually define layer mappings, as shown below:

quantization_stage:
  quantization_modifiers:
    SmoothQuantModifier:
      smoothing_strength: 0.5
      mappings: [
        [["re:.*q_proj", "re:.*k_proj", "re:.*v_proj"], "re:.*input_layernorm"],
        [["re:.*gate"], "re:.*post_attention_layernorm"]
      ]
      ignore: ["lm_head"]

Now:

With this update, the SmoothQuantModifier automatically infers the mappings based on the architecture, simplifying the configuration:

quantization_stage:
  quantization_modifiers:
    SmoothQuantModifier:
      smoothing_strength: 0.5
      ignore: ["lm_head"]

Key Changes:

Auto-inference of mappings: The SmoothQuantModifier now automatically detects and applies appropriate layer mappings based on the model's architecture, making the modifier more user-friendly and reducing the risk of manual configuration errors.
Manual mappings parameter removal: The mappings parameter is no longer required in the configuration, as it is inferred dynamically.
Backward Compatibility: Existing configurations that manually specify mappings will still be supported, ensuring smooth transition and compatibility with older setups.

Motivation:

These changes improve usability by automating configuration setup and reducing user overhead, as outlined in the design document: Link to Design Doc. This also ensures that the quantization recipes are adaptable to various model architectures without manual intervention.

The autoinference of mappings were tested using a Mixtral model: Isotonic/TinyMixtral-4x248M-MoE

vllm-project / llm-compressor