Describe the bug
After a model is generated running big_model_fp8.py, lm_eval dont not work unless the .py files from the original base model is transferred to the generated model folder. Happens for
https://huggingface.co/microsoft/Phi-3-medium-128k-instruct
OSError: test_phi_3_medium_128k_instruct_fp8 does not appear to have a file named configuration_phi3.py. Checkout 'https://huggingface.co/test_phi_3_medium_128k_instruct_fp8/tree/None' for available files.
Expected behavior
run lm_eval without any errors
Environment
Include all relevant environment information:
OS [e.g. Ubuntu 20.04]:
Python version [e.g. 3.7]:
LLM Compressor version or commit hash [e.g. 0.1.0, f7245c8]:
ML framework version(s) [e.g. torch 2.3.1]:
Other Python package versions [e.g. vLLM, compressed-tensors, numpy, ONNX]:
Other relevant environment information [e.g. hardware, CUDA version]:
To Reproduce
import torch
from transformers import AutoTokenizer
from llmcompressor.transformers import SparseAutoModelForCausalLM, oneshot
from llmcompressor.transformers.compression.helpers import ( # noqa
calculate_offload_device_map,
custom_offload_device_map,
)
# define a llmcompressor recipe for FP8 quantization
# this recipe requires no calibration data since inputs are dynamically quantized
recipe = """
quant_stage:
quant_modifiers:
QuantizationModifier:
ignore: ["lm_head"]
config_groups:
group_0:
weights:
num_bits: 8
type: float
strategy: channel
dynamic: false
symmetric: true
input_activations:
num_bits: 8
type: float
strategy: token
dynamic: true
symmetric: true
targets: ["Linear"]
"""
# model_stub = "meta-llama/Meta-Llama-3-70B-Instruct"
model_stub = "microsoft/Phi-3-medium-128k-instruct"
# determine which layers to offload to cpu based on available resources
device_map = calculate_offload_device_map(
model_stub, reserve_for_hessians=False, num_gpus=1, torch_dtype=torch.float16
)
# alternatively, specify the maximum memory to allocate per GPU directly
# device_map = custom_offload_device_map(
# model_stub, max_memory_per_gpu="10GB", num_gpus=2, torch_dtype=torch.float16
# )
model = SparseAutoModelForCausalLM.from_pretrained(
model_stub, torch_dtype=torch.float16, device_map=device_map
)
# output_dir = "./test_output_llama3b_70b_fp8"
output_dir = "./test_" + model_stub.split("/")[-1].replace("-", "_").lower() + "_fp8"
oneshot(
model=model,
recipe=recipe,
output_dir=output_dir,
save_compressed=True,
tokenizer=AutoTokenizer.from_pretrained(model_stub),
)
Describe the bug After a model is generated running
big_model_fp8.py
, lm_eval dont not work unless the .py files from the original base model is transferred to the generated model folder. Happens for https://huggingface.co/microsoft/Phi-3-medium-128k-instructExpected behavior run lm_eval without any errors
Environment Include all relevant environment information:
f7245c8
]:To Reproduce
Then run
where
eval_openllm.sh
isErrors If applicable, add a full print-out of any errors or exceptions that are raised or include screenshots to help explain your problem.
Additional context Add any other context about the problem here. Also include any relevant files.