The MUL unit of APOT - Githubissues

Hi,

Do you have the specific design of the MUL (Multiplication) unit for APOT quantization?

We know that uniform(Int) quantization or POT quantization are friendly to hardware.

Assume that: R = real number S = Scale number T = quantized number R1 = S1 T1 R2 = S2 T2

Uniform quantization simply adopts the INT MUL unit:

T1 = m
T2 = n

So, we have:

R1 * R2 = (S1 * S2) * (m * n)

For POT:

T1 = 2^m
T2 = 2^n

So, we have:

R1 * R2 = (S1 * S2) * (2^m * 2^n) 
             =  (S1 * S2) * 2^(m + n)

The POT is similar to the only-exponent float MUL.

However, for APOT, I have two questions about the MUL design. There are additive elements in the data. Assume a 4-bit POT: The first two bits decoder table:	00	01	10	11
2^0	2^-1	2^-3	2^-5

And the last two bits:

00	01	10	11
2^0	2^-2	2^-4	2^-6

For the first two bits, the decoder table is not continuous: 0, -1, -3, -5.

Q1: How do you efficiently decode the binary code to the APOT, especially in the MUL unit?

Aussume the two number in APOT:

0101: T1 = 2^-1 + 2^-2
1010: T2 = 2^-3 + 2^-4
T1 * T2 = (2^-1 + 2^-2) * (2^-3 + 2^-4) 
             = (2^-1 * 2^-3) + (2^-1 * 2^-4) + (2^-2 * 2^-3) + (2^-2 * 2^-4)
             = 2^-4 + 2^-5 + 2^-5 + 2^-6

Obviously, the calculation has 4x (9x) add operations than POT in 4-bit (6-bit). And the result violates the definition of APOT, which won't have the same additive element in a number, such as 2^-5.

Q2: How do you deal with the complex computation and the subnormal number for APOT?

One direct solution is to convert a float with fake quantization. But is it a violation of the principle of quantization?

yhhhli / APoT_Quantization

The MUL unit of APOT #19

Q1: How do you efficiently decode the binary code to the APOT, especially in the MUL unit?

Q2: How do you deal with the complex computation and the subnormal number for APOT?