Open sergey-kintsel opened 6 years ago
Possibly related to #3976 (unfortunately closed AFAICS) and #3516.
I see you found the REPL code! It looks to me that the evaluation of the term happens happens where normaliseBlocking
is called.
When we spoke on the Idris Slack channel, I'm pretty sure I was misguided to suggest that it might be a problem of not having the GMP
flag enabled. The evaluation of the [1..1000]
expression in the REPL should be a matter of how it's evaluated by the Haskell code in normaliseBlocking
.
Hopefully one of the other @idris-lang/contributors will be able to give you more guidance about how the evaluator works and why it might be slow in this case. Otherwise, dig into src/Idris/Core/Evaluate.hs
to see what you can find!
I recalled that Agda has a mechanism to optimise datatypes that are like Nat
. Googling, I came up with:
data ℕ : Set where
zero : ℕ
suc : (n : ℕ) → ℕ
{-# BUILTIN NATURAL ℕ #-}
See
Integers have a similar "builtin" mechanism in Agda.
I'm not sure if Idris has a similar mechanism. If not, perhaps that's what's required to resolve these performance issues?
Nat
is unary and therefore a number N is represented by a linked list with N nodes.
The compiler special-cases Nat
and compiles it to big integers, which are more efficient. There is no such optimisation if you don't compile, just evaluate (REPL).
AFAIK, the Agda builtin mainly means that you can write 3
instead of (suc (suc (suc zero)))
. I don't know if Agda compiles Nat
the same way as Idris does.
Either way, if you want big numbers at runtime, you likely need Int
or Integer
. Nat
is mostly useful as a measure of size of something because its structure is useful for reasoning, and if you already have that "something", then it's less problematic to have a Nat
of the corresponding size. However, if you don't need the structure, Int
or Integer
will be more efficient.
Perhaps one day we'll have a more principled optimisation that will allow using big Nat
s everywhere, with their useful structure, without their representational overhead.
Hi @ziman, thanks for stopping by!
The compiler special-cases Nat and compiles it to big integers, which are more efficient. There is no such optimisation if you don't compile, just evaluate (REPL).
That's sad. We should use a similar optimisation for REPL evaluation.
AFAIK, the Agda builtin mainly means that you can write 3 instead of (suc (suc (suc zero))). I don't know if Agda special-cases Nat the same way as Idris does.
No, the use of NATURAL in Agda has 4 effects:
See http://agda.readthedocs.io/en/v2.5.3/language/built-ins.html#natural-numbers
So, for Idris, we'd want to add (2) as (1) and (3) seem already taken care of. Not sure about (4), it looks like a complication 😄.
Either way, if you want big numbers at runtime, you likely need Int or Integer. Nat is mostly useful as a measure of size of something because its structure is useful for reasoning, and if you already have that "something", then it's less problematic to have a Nat of the corresponding size. However, if you don't need the structure, Int or Integer will be more efficient.
I'm not entirely following but I think we want Haskell Integer
.
Perhaps one day we'll have a more principled optimisation that will allow using big Nats everywhere, with their useful structure, without their representational overhead.
@sergey-kintsel seems keen to solve this. I'm happy to help. Rather than "one day", with a little direction, we could hope for "one day next week" 😉! Could you point to the optimisation that the compiler does to replace uses of Nat
with GMP integer operations? I imagine the structure of that operation would be very similar to what's needed for evaluation-time optimisation.
Ah, thank you for the clarification wrt. Agda.
I'm not entirely following but I think we want Haskell Integer.
In my personal opinion (which may be a bit radical), sometimes Nat
is simply not the right type and it is better to use Int
or Integer
in your source -- because the meaning is different. For example, I'd most likely use a pair of Int
s for widget dimensions: an opaque number is sufficient there, and it performs better. I prefer to read Nat
as Count
, and you hardly ever count pixels one by one in your types or elsewhere, as opposed to elements of a list or the depth of a tree.
Either way, the translation currently happens in Idris.DataOpts
, which lives in src/Idris/DataOpts.hs
. Compile-time optimisation is performed on the Raw
representation.
We've been talking about generalising this for a very long time but I can't find anything written up online. You could extend this idea of big ints to Fin
besides Nat
, represent vectors as arrays, binary trees packed in arrays (like in heapsort), etc.
I have some notes and even syntax ideas but they are far from anything complete. For example, changing the representation changes the complexity of operations (e.g. (+1)
becomes more expensive for big integers than for Nat
, the same goes for cons
in arrays vs. vectors) so you probably want to let the programmer choose the implementation somehow...
Another cool optimization would be to substitute Bits
for binary numbers modulo (or bitvectors) .
I was looking DataOpts.hs though and found this:
(Var n, args) <- raw_unapply t -- MAGIC HERE
that's an amazing code comment :joy:
looks like I am not the person to solve this. I have no idea how it works internally tbh 😅
Idris version: 1.2.0 OSs: macOS & Ubuntu
Steps to Reproduce
Open REPL type
[1..1000]
and hit Enter. It takes very long and consumes 100% of 1 CPU coreExpected Behavior
It should be as fast as a bullet
Important notes:
the (List Nat/Integer) [1..1000]
it works slowthe (List Int) [1..1000]
in REPL, it works not so slow (I am able to wait for results)main : IO () main = print $ foldr (+) 1 (the (List Nat) [1..1000])