vllm-project / llm-compressor

Transformers-compatible library for applying various compression algorithms to LLMs for optimized deployment with vLLM
Apache License 2.0
385 stars 28 forks source link

adding custom arithmetic #41

Closed Ravenwater closed 1 month ago

Ravenwater commented 1 month ago

Is your feature request related to a problem? Please describe. There is a C++ library https://github.com/stillwater-sc/universal for mixed-precision algorithm development and optimization that has tens of thousands of arithmetic types that could be leveraged in quantization. From custom floating-points and fixed-points, to tapered floating points in posits and takums, to logarithmic and double base systems.

How would we go about integrating that capability into the llm-compressor

Describe the solution you'd like architecture evaluation to make certain that this is reasonable engineering effort that would be win-win for both environments

Describe alternatives you've considered We have directly integrated into PyTorch, but that kept bit rotting due to the rapid change of PyTorch. We currently do everything through Intel's floating-point compressor library.

Additional context mixed-precision algorithms have been very valuable in the HPC and DSP verticals and are being rediscovered in the new AI space. There is a wealth of knowledge in the HPC and DSP space about custom arithmetic that could rapidly be applied to AI model quantization.

bfineran commented 1 month ago

Hi @Ravenwater take a look at compressed-tensors. It acts as our backend for quantization and contains the config/definition we use for quantized models. To achieve something like this we'd have to extend the quantization config to allow for more data types to be specified. (potentially QuantizationType in the args: https://github.com/neuralmagic/compressed-tensors/blob/c214cbc17ed651eba01f301ffdeab354100e23dc/src/compressed_tensors/quantization/quant_args.py#L74) As part of this we'd also need to update observer+Q/DQ implementations to work around more data types