Compression Ratio - Githubissues

Apologies for the delayed response.

Regarding your inquiry, for the 1-bit weights, we indeed use the packed format. As for the 8-bit parameters within each weight matrix, due to their sparse nature with a low percentage of density, conventional techniques like CSR may not be the most suitable. Currently, we are exploring the modified run-length encoding (RLE) to achieve an efficient compression ratio for the 8-bit sparse data.

In our modified RLE, each 8-bit data point is represented by a pair of values: the actual 8-bit data and the count of consecutive occurrences of leading zeros. For example, original sequence: 0 0 0 0 0 0 5 0 0 1. RLE representation: (6, 2) (5, 1).

Considering the storage cost, the RLE representation typically involves storing the value and count as pairs, and each pair might require 12 bits (8 bits for the value and 4 bits for the count).

For 10% outlier, if we quantize the weights to 8-bit, the average bits for each value is 1+(8+4)0.1=2.2 bits (compression ratio=1-2.2/16=86.3%). If we quantize the weights to 4-bit, the average bits for each value can be reduced to 1+(4+4)0.1=1.8 bits (compression ratio=1-1.8/16=88.8%).

hahnyuan / PB-LLM

Compression Ratio #1