Quire capacity and overflow

gonced8 commented 3 years ago

I was reading the Posit Standard and notice that it says that the quire cannot overflow up to 2^c-1 additions (with c=nbits-1, which, if I understand correctly, is the parameter capacity).

My question is about the quire implementation, since it mentions that the capacity indicates the power of 2 number of accumulations of maxpos^2 the quire can support. Does this mean that this implementation supports an additional accumulation? Since the standard has that -1 in the capacity. I also noticed that the upper_range is greater than the half_range by 1 bit, because of the maxpos^2 scale. However, according to the standard, the upper (integer) and half (fraction) should have the same size. And if the upper range has 1 more bit, then I believe the quire can actually accumulate 2^(c+1)-1 products maxpos^2 without overflow.

Moreover, total_bits() returns qbits+1. qbits is equal to range+capacity. Therefore, the total size would be sign + capacity + range. Looking again to the Standard, the quire structure indicates that the range corresponds to the size of the integer+fraction. However, in this implementation, the upper_range is 1 bit larger than the half_range; does that mean that this quire would actually have 1 bit more?

Sorry for the questions, but I've been trying to understand and wrap my head around this, with no success. Am I misunderstanding the implementation or the standard? Thank you in advance!

(I'm seeing this standard: https://posithub.org/docs/posit_standard.pdf )

gonced8 commented 3 years ago

I've been trying some changes and I came up with a workaround to get the same size as the Standard, which is to use a capacity equal to the carry guard number of bits - 1. So if the Standard has c=nbits-1, then one may use c-1=nbits-2for the capacity of this quire implementation. This way, the extra bit of the upper_range can be seen as the bit that we removed from the capacity.

Ravenwater commented 3 years ago

@gonced8 Goncalo, sorry I did not see this issue till now. These are good questions.

The implementation in Universal supported different experiments how to create scalable segmented quires for SIMD type ISAs. As a consequence, the state space of the quire implementation is much bigger than the posit standard as the goal was to understand how to generate traces for fast CSA hw implementations.

I really appreciate your thought that went into resolving the two specifications.

My own recollection how this implementation shaped up was the observation that maxpos^2 and minpos^2 are asymmetric when represented in fixed-point. The maxpos^2 representation needs that extra bit, which then 'bleeds' into the capacity bits and affects the number of accumulations that the assigned capacity supports.

The basic idea behind the universal quire is that you need to set the capacity large enough to not overflow during fused dot products. However, as we are using it in algorithm development, we observed that the worst case is way too expensive, particularly when you are working with FPGA implementations. So, we are still looking for more insight in the dynamic behavior of algorithms and the appropriate scaled quire for cloud FPGA implementations.

Love to hear your thoughts on this.

gonced8 commented 3 years ago

@Ravenwater Thank you very much for your response Theodore! And no problem for the delay!

The work I've been doing involving posits is with neural networks, particularly, low-precision posits and the accumulation in matrix multiplications and convolutions. In my case, I don't have many problems with overflows, since the data usually has zero mean. Nonetheless, I also plan to study the possibility of scaling the posits before an accumulation (e.g. the two matrices before the matrix multiplication), as to decrease the probability of overflow (but introducing some error due to that initial scaling).

About the Universal implementation of the quire, I think the upper bitblock should be implemented with the same size as the half bitblock. I understand that the extra bit was added to the upper in order to accommodate maxpos^2, otherwise, maxpos^2 would immediately bleed/occupy the last bit of the capacity bitblock. However, with that extra bit, the quire actually supports 2^(capacity+1)-1 accumulations of maxpos^2, instead of the indicated 2^capacity. If that extra bit of the upper were to be removed, it would support 2^(capacity)-1 accumulations, which would aso be in accordance with the carry guard specified in the Standard.

If I'm being confusing, let me know, and I'll try to explain it in more detail. Thank you once more for your time, and I'd also like to hear your thoughts!

Ravenwater commented 3 years ago

@gonced8 Brilliant, thanks for the extra info. There are several groups working on posit DNN, but you are the first to my knowledge to leverage the quire directly. @petergottschling has been working on linear algebra runtimes (MTL4 and MTL5) that incorporate the quire as part of the BLAS operators, so all the tensor math of the DNN is seamlessly quire-enabled when you use a posit scalar type.

Quire implementation improvements are on the roadmap, in particular, high-performance emulations for 'small' posits. The second design is taking shape in the fixed-point number system, although it is not ready yet for the sizes required for posit quires.

If you are interested in this type of software engineering, you can take a swing at it too. Take a look at the new structures in integer<> and fixpnt<> classes that bring in storage classes on which the arithmetic is parameterized. These storage classes make the small numbers optimal in terms of storage, and they provide a type-driven, incremental performance optimization possible.

Let me know how we can help in your DNN work.

gonced8 commented 3 years ago

@Ravenwater Thank you! If you want to take a look, I have a public version of the framework I've been developing with posits and quires: PositNN. A newer version I've been working on exploits mixed precision in different layers. For this, I developed a tensor class (StdTensor) and implemented the operations from scratch. I noticed that I was able to use the posit with MTL4, but I wanted to exploit the quire, so I ended up developing my own functions.

I'll be following your work with interest!

stillwater-sc / universal

Quire capacity and overflow #157