sbp / idris-bi

Idris Binary Integer Arithmetic, porting PArith, NArith, and ZArith from Coq
https://github.com/idris-lang/Idris-dev/issues/3976
36 stars 5 forks source link

Idris Binary Integer Arithmetic

This work-in-progress package implements structurally inductive arbitrary-precision binary integer arithmetic in Idris, a dependently typed pure functional programming language.

Idris already contains types for unary integer arithmetic and fixed-precision binary integer arithmetic, but it does not contain types for arbitrary-precision binary integer arithmetic. This package fills that gap. Once completed, the authors hope for this package to be incorporated into the Idris standard library per feature request #3976.

This package implements arithmetics based on PArith, NArith, and ZArith from the Coq programming language. In Coq the primary types of these arithmetics are called positive, N, and Z respectively, but in this package those types are renamed Bip, Bin, and Biz for consistency.

Because Idris is dependently typed, it is possible to assert arbitrary properties about the code that the type checker will verify at compile time. This means that it is possible to prove characteristics of code, equivalent to running tests for all possible inputs (even an infinite number) simultaneously. We are porting dozens of proofs specifying the behaviour of the three arithmetics from Coq, making it possible to rely on those proven specifications when using the code.

Fixed precision

This package also includes a fixed-precision binary arithmetic BizMod2, based on CompCert's integers. It uses the Biz representation underneath, but normalizes all operations modulo 2^n, where n is an arbitrary Nat index. Idris' standard Data.Bits type, mentioned above, is wrapper for machine words that only tracks the word size; its internal structure is opaque to the type checker. The inductive definition here allows us to prove arbitrary logical-arithmetic properties, e.g., for operations like bit shifts.

This, however, comes at the cost of performance, since the runtime presentation of BizMod2 is still inductive. Finding a way to transparently map this type (as well as the arbitrary precision ones) to optimized machine arithmetic is a goal for the future.

Motivation

Using Nat, the unary arithmetic type, for arithmetic on large numbers in Idris causes significant performance degradation when type checking, compiling, evaluating, and executing code. Consider the example of taking the remainder of dividing the current unixtime by the number of seconds in a day:

main : IO ()
main = printLn (modNatNZ 1501857320 (S 86399) SIsNotZ)

This file takes so long to type check or to compile that the duration would have to be extrapolated from smaller examples. Instead, we can use %freeze after lifting the slow terms to the top level:

n1501857320 : Nat
n1501857320 = 1501857320

n86399 : Nat
n86399 = 86399

%freeze n1501857320
%freeze n86399

main : IO ()
main = printLn (modNatNZ n1501857320 (S n86399) SIsNotZ)

This file is reasonably fast to type check and to compile, but now the runtime takes so long to execute that once again the actual duration could only be extrapolated from smaller examples. The code is also extremely ugly.

If we use the P, N, and Z arithmetics from Coq, this code can be made to work much faster. The tutorial in VFA explains why these arithmetics are necessary and how they are constructed. An element of ZArith is already mentioned several times in the Idris literature:

There's also a different element of it in Data.ZZ.

This is somewhat related to Issue #3516, The cost of computing Nat equality proofs at type check time, but doesn't solve that issue directly since here a new type is proposed to sidestep the issue altogether. Indeed, @edwinb suggested an orthogonal solution there:

You know, I think I'm going to take back that comment about "little point in hard coding things for Nat" because realistically that's the biggest problem we're going to encounter at compile time, and given that we say that Nat is for unbounded unsigned things, we probably ought to be a bit cleverer about it.

Overview

The main types are as follows:

data Bip = U | O Bip | I Bip
data Bin = BinO | BinP Bip
data Biz = BizO | BizP Bip | BizM Bip

We can now rewrite the poorly-performing example above as:

import Bin

main : IO ()
main = printLn (the Integer (cast (binMod 1501857320 86400)))

Which compiles in reasonable time:

$ time idris Performance.idr -o Performance
idris Performance.idr -o Performance
   5.09s user 0.48s system 108% cpu 5.147 total

And runs with blazing speed:

$ time ./Performance
"52520"
./Performance
   0.00s user 0.00s system 40% cpu 0.013 total

Contributors and licensing

See contributors for the definitive list of contributors. Since the intention is for this package to be incorporated into Idris, we intend also to use the same license as Idris itself.