turboderp / exllamav2

A fast inference library for running LLMs locally on modern consumer-class GPUs
MIT License
3.53k stars 273 forks source link

Optimize Checkpoint File Saving and Handling in Quantize.py with Atomic Operation #312

Closed bgorlick closed 8 months ago

bgorlick commented 8 months ago

A micro-optimization in the file handling process within checkpointing in the quantization module specifically in quantize.py. The change involves using os.replace for renaming the temporary file to the final filename, ensuring the operation is atomic.

Risk mitigation is basically the idea here. This eliminates the brief moment where the file might not exist during the renaming process, which while very unlikely could still theoretically result in risk of data loss or inconsistencies. Given we're dealing with quantization, precision everywhere possible seems prudent.

Key Changes:

turboderp commented 8 months ago

Changes covered in #310