turboderp / exllamav2

A fast inference library for running LLMs locally on modern consumer-class GPUs
MIT License
3.2k stars 236 forks source link

Add graceful exit sig handling and status box for quantization to assist estimating completion time and overall accuracy #310

Closed bgorlick closed 5 months ago

bgorlick commented 5 months ago

A few user experience enhancements during the quantization and measurement process:

  1. Graceful Exit Signal Handling:

    • Adds signal handling to allow the measurement process of quantization to exit gracefully. This ensures that the process can be safely paused or stopped, and provides a catch / second chance if a user hits CTRL-C unintentionally.
  2. Status Box with useful insights:

    • Implements a status box that appears after each measurement process completes of a module in the step-process that provides valuable insights during the quantization process. Most useful is probably the overall accuracy of the quantization and measurement process at that precise moment in time, and a time estimate for completion of the full quant.

Stats in the status box include:

The idea here is simply to improve the user experience by providing better control and visibility during the quantization process. For those paying for compute to quantize, the time to completion estimates can assist with calculating compute costs.

https://github.com/turboderp/exllamav2/assets/5460972/a010f4f2-f575-49b7-8297-df3c75b9cf33