vllm-project / llm-compressor

Transformers-compatible library for applying various compression algorithms to LLMs for optimized deployment with vLLM
Apache License 2.0
745 stars 63 forks source link

Does `one_shot` save model twice? #936

Open fzyzcjy opened 3 days ago

fzyzcjy commented 3 days ago

Hi thanks for the lib! When checking https://github.com/vllm-project/llm-compressor/issues/935, it seems that one_shot auto saves everything to the output folder. That looks great, but if I understand correctly, https://github.com/vllm-project/llm-compressor/blob/a47137d834a2be8f8fcd49458f121b08ba34e2c9/examples/quantization_kv_cache/llama3_fp8_kv_example.py#L99 here we want to manually save. Thus it seems the example script saves everything twice.

kota-iizuka commented 3 days ago

I have confirmed that the same problem occurs. Probably due to https://github.com/vllm-project/llm-compressor/blob/a47137d834a2be8f8fcd49458f121b08ba34e2c9/src/llmcompressor/transformers/finetune/training_args.py#L61, if you specify output_dir as shown below, it will be saved once.

- oneshot(model=model, recipe=recipe)
- model.save_pretrained(SAVE_DIR)
+ oneshot(model=model, recipe=recipe, output_dir=SAVE_DIR)
  tokenizer.save_pretrained(SAVE_DIR)
fzyzcjy commented 3 days ago

it will be saved once

Well it seems the save operation is done twice, but write (and overwrite) to the save folder...

kota-iizuka commented 3 days ago

Did you delete model.save_pretrained(SAVE_DIR) ?

fzyzcjy commented 3 days ago

@kota-iizuka I personally want to keep model.save_pretrained (and tokenizer.save_pretrained) for better fine-control indeed

kota-iizuka commented 3 days ago

@fzyzcjy I understand your request, and I agree that it would be nice to have an option to not save the model when running oneshot() if you don't specify output_dir (or if you specify any special parameters).

(On the other hand, I personally don't have motivation to fix that since I just want the resulting model...)

dsikka commented 3 days ago

Hi @fzyzcjy @kota-iizuka if you pull down the latest main, you can avoid saving twice by not providing an output_dir to the oneshot call. It will only save in the output_dir if the kwarg is provided or if you provide a string as the model input to the model, not an actual model instance.

fzyzcjy commented 3 days ago

Thanks!