idris-lang / Idris-dev

A Dependently Typed Functional Programming Language
http://idris-lang.org
Other
3.43k stars 644 forks source link

List construction works very slow in REPL if I don't explicitly define its type as `the (List Int)` #4290

Open sergey-kintsel opened 6 years ago

sergey-kintsel commented 6 years ago

Idris version: 1.2.0 OSs: macOS & Ubuntu

Steps to Reproduce

Open REPL type [1..1000] and hit Enter. It takes very long and consumes 100% of 1 CPU core

Expected Behavior

It should be as fast as a bullet

Important notes:

main : IO () main = print $ foldr (+) 1 (the (List Nat) [1..1000])


it works *fast*.

I tried to investigate the problem myself, but I got only to this point: https://github.com/idris-lang/Idris-dev/blob/master/src/Idris/REPL.hs#L859.
steshaw commented 6 years ago

Possibly related to #3976 (unfortunately closed AFAICS) and #3516.

I see you found the REPL code! It looks to me that the evaluation of the term happens happens where normaliseBlocking is called.

When we spoke on the Idris Slack channel, I'm pretty sure I was misguided to suggest that it might be a problem of not having the GMP flag enabled. The evaluation of the [1..1000] expression in the REPL should be a matter of how it's evaluated by the Haskell code in normaliseBlocking.

Hopefully one of the other @idris-lang/contributors will be able to give you more guidance about how the evaluator works and why it might be slow in this case. Otherwise, dig into src/Idris/Core/Evaluate.hs to see what you can find!

steshaw commented 6 years ago

I recalled that Agda has a mechanism to optimise datatypes that are like Nat. Googling, I came up with:

data ℕ : Set where
  zero : ℕ
  suc  : (n : ℕ) → ℕ

{-# BUILTIN NATURAL ℕ #-}

See

Integers have a similar "builtin" mechanism in Agda.

I'm not sure if Idris has a similar mechanism. If not, perhaps that's what's required to resolve these performance issues?

ziman commented 6 years ago

Nat is unary and therefore a number N is represented by a linked list with N nodes.

The compiler special-cases Nat and compiles it to big integers, which are more efficient. There is no such optimisation if you don't compile, just evaluate (REPL).

AFAIK, the Agda builtin mainly means that you can write 3 instead of (suc (suc (suc zero))). I don't know if Agda compiles Nat the same way as Idris does.

Either way, if you want big numbers at runtime, you likely need Int or Integer. Nat is mostly useful as a measure of size of something because its structure is useful for reasoning, and if you already have that "something", then it's less problematic to have a Nat of the corresponding size. However, if you don't need the structure, Int or Integer will be more efficient.

Perhaps one day we'll have a more principled optimisation that will allow using big Nats everywhere, with their useful structure, without their representational overhead.

steshaw commented 6 years ago

Hi @ziman, thanks for stopping by!

The compiler special-cases Nat and compiles it to big integers, which are more efficient. There is no such optimisation if you don't compile, just evaluate (REPL).

That's sad. We should use a similar optimisation for REPL evaluation.

AFAIK, the Agda builtin mainly means that you can write 3 instead of (suc (suc (suc zero))). I don't know if Agda special-cases Nat the same way as Idris does.

No, the use of NATURAL in Agda has 4 effects:

  1. The use of natural number literals is enabled. By default the type of a natural number literal will be Nat, but it can be overloaded to include other types as well.
  2. Closed natural numbers are represented as Haskell integers at compile-time.
  3. The compiler backends compile natural numbers to the appropriate number type in the target language.
  4. Enabled binding the built-in natural number functions: plus, minus, times, equals, less-than, div-suc-aux, mod-suc-aux.

See http://agda.readthedocs.io/en/v2.5.3/language/built-ins.html#natural-numbers

So, for Idris, we'd want to add (2) as (1) and (3) seem already taken care of. Not sure about (4), it looks like a complication 😄.

Either way, if you want big numbers at runtime, you likely need Int or Integer. Nat is mostly useful as a measure of size of something because its structure is useful for reasoning, and if you already have that "something", then it's less problematic to have a Nat of the corresponding size. However, if you don't need the structure, Int or Integer will be more efficient.

I'm not entirely following but I think we want Haskell Integer.

Perhaps one day we'll have a more principled optimisation that will allow using big Nats everywhere, with their useful structure, without their representational overhead.

@sergey-kintsel seems keen to solve this. I'm happy to help. Rather than "one day", with a little direction, we could hope for "one day next week" 😉! Could you point to the optimisation that the compiler does to replace uses of Nat with GMP integer operations? I imagine the structure of that operation would be very similar to what's needed for evaluation-time optimisation.

ziman commented 6 years ago

Ah, thank you for the clarification wrt. Agda.

I'm not entirely following but I think we want Haskell Integer.

In my personal opinion (which may be a bit radical), sometimes Nat is simply not the right type and it is better to use Int or Integer in your source -- because the meaning is different. For example, I'd most likely use a pair of Ints for widget dimensions: an opaque number is sufficient there, and it performs better. I prefer to read Nat as Count, and you hardly ever count pixels one by one in your types or elsewhere, as opposed to elements of a list or the depth of a tree.

Either way, the translation currently happens in Idris.DataOpts, which lives in src/Idris/DataOpts.hs. Compile-time optimisation is performed on the Raw representation.

We've been talking about generalising this for a very long time but I can't find anything written up online. You could extend this idea of big ints to Fin besides Nat, represent vectors as arrays, binary trees packed in arrays (like in heapsort), etc.

I have some notes and even syntax ideas but they are far from anything complete. For example, changing the representation changes the complexity of operations (e.g. (+1) becomes more expensive for big integers than for Nat, the same goes for cons in arrays vs. vectors) so you probably want to let the programmer choose the implementation somehow...

clayrat commented 6 years ago

Another cool optimization would be to substitute Bits for binary numbers modulo (or bitvectors) .

sergey-kintsel commented 6 years ago

I was looking DataOpts.hs though and found this:

(Var n, args) <- raw_unapply t -- MAGIC HERE

that's an amazing code comment :joy:

sergey-kintsel commented 6 years ago

looks like I am not the person to solve this. I have no idea how it works internally tbh 😅