vcskaushik / LLMzip

GNU General Public License v3.0
41 stars 1 forks source link

Questions about arithmetic coding #4

Closed Jone-Luo closed 3 months ago

Jone-Luo commented 3 months ago

Thank you for your excellent work and code. I have a few questions.

Regarding the arithmetic coding used, how did you determine the precision? Are you using infinite precision or finite precision? Is it necessary to transmit the probability distribution of the ranks as well? When calculating the compression ratio, did you take the size of the frequency table into account?

Thank you very much for your response.

vcskaushik commented 3 months ago

Hi Jone,

Thank you for your interest in our work.

This work uses finite precision arithmetic coding based on the code provided in Project Nayuki.

There are no frequency tables involved. Presuming you have an LLM at your disposal (which is increasingly common), you can compress any text. As long as you use the same LLM for decoding as well.

The LLM provides the probability distribution of the next token conditioned on the past input. We don't include the size of the LLM in our compression ratio calculations.