plclub / hs-to-coq

Convert Haskell source code to Coq source code.
https://hs-to-coq.readthedocs.io
MIT License
77 stars 8 forks source link

[CLOSED] Optimise pattern incompleteness check #146

Closed lastland closed 3 years ago

lastland commented 3 years ago

Issue by trommler Thursday Feb 20, 2020 at 16:02 GMT Originally opened as https://github.com/antalsz/hs-to-coq/pull/146


Reverse the list of patterns when checking for completeness. In Haskell more general patterns follow specialized patterns and in particular the catch-all clause is always last.

Fixes #112


trommler included the following code: https://github.com/antalsz/hs-to-coq/pull/146/commits

lastland commented 3 years ago

Comment by lastland Thursday Feb 20, 2020 at 18:17 GMT


This looks good to me.

@nomeata Since this is your code, what do you think?

lastland commented 3 years ago

Comment by nomeata Thursday Feb 20, 2020 at 22:18 GMT


Smells a bit like premature optimization, but I assume Antal had a good reason, so that’s fine.

Certainly non-obvious to readers why there is a reverse here, so a comment might be good (“more efficient if consider a possible catch-all clause first”)

lastland commented 3 years ago

Comment by trommler Friday Feb 21, 2020 at 12:13 GMT


Smells a bit like premature optimization, but I assume Antal had a good reason, so that’s fine.

Perhaps, I can explain my reasoning a bit better:

A common pattern in pattern (no pun intended!) matches in Haskell is that we have a set of specialized (sometimes nested) patterns first and then a more general pattern that often overlaps with the specialized patterns. Consider the following example (simplified from compiler/cmm/CmmExpr):

data C = C [Int] | D

fun :: C -> _
fun (C []) = _
fun (C [42]) = _
fun (C [n]) = _
fun (C [n1, n2]) = _
fun D = _
fun _  = _

The third pattern contains the second pattern and the last pattern catches all the missing lists of length three and above.

But also when we replace the last line with

fun (C xs) = _

then after checking that pattern only D will be left and that will be found as the next pattern and we are done.

That means the more general patterns will appear towards the end of the list and the less general patterns p1 replace one required pattern by a potentially large list of patterns that are not covered by p1 only to be later removed by a more general pattern p2. If by reversing the list we looked at the more general pattern p2 first and then at p1 we will not find a matching pattern among the missing patterns because p1 was already covered by p2. If p2 is a catch-all pattern we are done immediately.

The only case where this patch will be cause a linear slowdown is when all patterns are exhaustively enumerated. I believe that this is acceptable given that combinatorial blowup is avoided by this patch. I will try @jwiegley's example and report back.

Certainly non-obvious to readers why there is a reverse here, so a comment might be good (“more efficient if consider a possible catch-all clause first”) Agreed, I will add an explanation similar to the above as a comment.

The optimization allowed me to translate function getRegister in the PowerPC code generator in two seconds whereas before it consumed all 16 GB of memory on my machine and did not finish at all.

lastland commented 3 years ago

Comment by tadeuszlitak Friday Feb 21, 2020 at 15:57 GMT


I can confirm @trommler's experience mentioned in the last line of his reply: I also encountered this massive blow-up on my Mac. Prior to Peter's fix, hs-to-coq consumed close to 20 GB of resources on my machine at some point and I was about to give up on it.

lastland commented 3 years ago

Comment by nomeata Friday Feb 21, 2020 at 17:09 GMT


I wasn’t doubting the “optimization” part – and given that evidence, I take the “premature” back :-)

(The coverage checker is rather naive, so one can probably replace it with a proper algorithm to avoid that blow up. But as a quick fix for a common case it makes sense.)

lastland commented 3 years ago

Comment by lastland Friday Feb 21, 2020 at 19:36 GMT


OK. I have merged this PR and I will try to explain what the optimization does in a comment. Thanks @trommler! And thanks @tadeuszlitak for the comment.