neuralmagic / sparseml

Libraries for applying sparsification recipes to neural networks with a few lines of code, enabling faster and smaller models
Apache License 2.0
2.05k stars 144 forks source link

[Fix] Fully functional FSDP one-shot process #2305

Closed dbogunowicz closed 4 months ago

dbogunowicz commented 4 months ago

Note: This PR should be landed in unison with: https://github.com/neuralmagic/compressed-tensors/pull/58

Feature Description

A subtle set of fixes to enable FSDP one-shot. The fixes are mostly focused on correctly undoing the naming changes enforced by the wrapped FSDP module.

Testing

Note: The FSDP process was run with num_processes: 1, as well as num_processes: 2. Both runs yielded similar perplexities.

Model generation script

import torch

from sparseml.transformers import SparseAutoModelForCausalLM, oneshot

recipe = """
quant_stage:
    quant_modifiers:
        GPTQModifier:
            sequential_update: false
            ignore: ["lm_head"]
            config_groups:
                group_0:
                    weights:
                        num_bits: 4
                        type: "int"
                        symmetric: true
                        strategy: "channel"
                    targets: ["Linear"]
"""
model_stub = "TinyLlama/TinyLlama-1.1B-Chat-v1.0"
model = SparseAutoModelForCausalLM.from_pretrained(
    model_stub, 
    torch_dtype=torch.bfloat16, 
)

dataset = "open-platypus"
output_dir = "./model"
splits = {"calibration": "train[:5%]"}
max_seq_length = 512
pad_to_max_length = False
num_calibration_samples = 512
oneshot(
    model=model,
    dataset=dataset,
    recipe=recipe,
    output_dir=output_dir,
    splits=splits,
    max_seq_length=max_seq_length,
    pad_to_max_length=pad_to_max_length,
    num_calibration_samples=num_calibration_samples,
    save_compressed=True,
)

To run FSDP training:

accelerate launch --config_file integrations/huggingface-transformers/finetuning/example_fsdp_config.yaml model_generation_script.py 

Model testing script

from sparseml import evaluate
from sparseml.transformers import SparseAutoModelForCausalLM, SparseAutoTokenizer
import torch
from datasets import load_dataset
from sparseml.pytorch.utils.helpers import tensor_sparsity

print(evaluate("model", limit=100, integration="perplexity", datasets="garage-bAInd/Open-Platypus", text_column_name="instruction"))
print(evaluate("model_fsdp", limit=100, integration="perplexity", datasets="garage-bAInd/Open-Platypus", text_column_name="instruction"))

Result

The resulting post-FSDP one-shot model has the same perplexity and sparsity of its weights compared to the counterpart:

# eval for non-fsdp model (compressed=False or True yields the same perplexity)

formatted=[Evaluation(task='text-generation', dataset=Dataset(type='text-generation', name='garage-bAInd/Open-Platypus', config=None, split=None), metrics=[Metric(name='perplexity', value=17.98309205532074)], samples=None)] raw={'mean_perplexity': 17.98309205532074}

# eval for fsdp model (compressed=False)

formatted=[Evaluation(task='text-generation', dataset=Dataset(type='text-generation', name='garage-bAInd/Open-Platypus', config=None, split=None), metrics=[Metric(name='perplexity', value=17.03630661010742)], samples=None)] raw={'mean_perplexity': 17.03630661010742}