Closed SmiVan closed 2 years ago
Tagging @madman-bob and @stefan-hoeck since they thought about it not too long ago.
Related examples from some languages:
The situation is made more convoluted by %
, which in Perl, Python and Ruby refers to the modulo operation, but in C, D, F#, Java, JavaScript and Rust refers to the remainder operation, but since %
is not an operation in Idris, we can just ignore that whole mess.
There's a more thorough list of different definitions of modulo operations on this wikipedia article, however I didn't hand-check these so I can't guarantee accuracy.
Oh my, at the beginning I was like "oh this is an easy fix" and then I realized that there is an Ouroboros situation going on.
The Int/Integer primitives are compiled by the Idris2 compiler to Integer primitives of the previous release, so this will take a little more than just changing the definitions DX
The RefC backend already behaves "correctly" (using Euclidian div and mod, see #1480). We'd like to have the same behavior for the other backends.
While I am still in the process of cleaning up the situation of integer primitives on JS and Scheme backends, I'm rather busy finishing a complete overhaul of the JS backend right now. Still, I might find some time to look into this over the next couple of days.
In Haskell, mod
and rem
correspond to floored and truncated modulus, respectively. In practise, however, what you almost always want is the Euclidean modulus. Depending on your use case, in Haskell you can usually get away with using one-or-the-other of the available moduluses (moduli?), but sometimes solutions can end up fragile.
The paper I linked in #1480 compares four types of modulus (including ceiling modulus, but no one uses that), and the Euclidean modulus clearly has the nicest properties.
That said, that doesn't mean we need to restrict ourselves to only the Euclidean modulus. However, I would want to see a compelling use-case for including some other modulus - a practical use, which would not be solved as easily using a Euclidean mod.
I'm not particularly knowledgeable on the different modulo types, all I've noticed is that this makes things confusing when trying to adapt algorithms from/to other languages.
I'm used to mod
being floored modulo and rem
being truncated modulo because this seems to be a trend in several languages I know. In fact - if the wikipedia article for the modulo operation is to be trusted, it appears a lot of languages have mod
as floored and rem
as truncated, and the usage of the euclidean modulo at all is somewhat uncommon.
For what it's worth though - I would use the truncated modulo (rem
) in my converter of decimal to balanced ternary, as it closely follows the numeric pattern for balanced ternary, though I'm sure there would be some other way to accomplish the same if it was absent, probably with extra steps to flip the order of numbers for negative dividends and also passing the sign through.
And I've yet to run into a situation in practice where I'd need to distinguish between floored and euclidean modulo, so I have no opinion on this one, although skimming the Daan Leijen paper gives the impression that euclidean provides some optimization in some usage cases.
You're right that changes may mean you need to adapt algorithms, but that's not always a bad thing. With the right abstractions, algorithms can be simplified, making it easier to confirm correctness, and easier to adapt them again in the future.
While there is benefit in familiarity, it's not useful to stick with what everyone's always done just for the sake of it. Otherwise, there would never be any progress. Similarly, Idris doesn't have null
, despite many mainstream languages having something like that.
That many languages include both mod
and rem
suggests to me that they haven't thought deeply about the choice, and are including them just to cover all bases.
Balanced ternary sounds like a good example. I've included a snippet below of how I would do it with Euclidean mod
. I don't think it could be made much simpler with rem
, though the + 1
is suspicious. How would you approach it?
Floored and Euclidean mod differ only when the second argument is negative, which tends to be uncommon, so I expect most people don't have an opinion on that one.
You're also right about the optimizations. As well as the mathematical properties, it plays nicely with the twos-complement representation of integers, allowing for some expressions to be rephrased as bit-ops.
Please note: Snippet requires Euclidean mod, which is currently only in the RefC backend.
data BalancedTernaryDigit = M | Z | P
BalancedTernary : Type
BalancedTernary = List BalancedTernaryDigit -- Least Significant Digit first
fromBTD : BalancedTernaryDigit -> Integer
fromBTD M = -1
fromBTD Z = 0
fromBTD P = 1
fromBT : BalancedTernary -> Integer
fromBT [] = 0
fromBT (x :: xs) = fromBTD x + 3 * fromBT xs
toBTD : Integer -> BalancedTernaryDigit
toBTD x = case x `mod` 3 of
0 => Z
1 => P
_ => M
toBT : Integer -> BalancedTernary
toBT 0 = []
toBT x = toBTD x :: toBT ((x + 1) `div` 3)
Your snippet is actually a much cleaner implementation than my attempt (I can't compile RefC at all for some reason on my machine, so I can't verify for myself, but it appears correct on visual inspection), so I withdraw my comment regarding balanced ternary.
I suppose rem
can be a bit of language bloat if it's not particularly commonly used, so from that standpoint I understand how it may be undesirable to have it among the standard functions. Leaving it and other variants of modulo to optional libraries will make sense then, unless anyone can come up with another practical scenario to use rem
in.
So with this considered, this issue reduces to aligning the other codegens to use Euclidean mod
(and div
) for consistency between backends.
Steps to Reproduce
Modulo:
Remainder:
Expected Behavior
Modulo:
Remainder:
Observed Behavior
Modulo:
Remainder:
Commentary
The function for
mod
seems to be incorrectly implemented as the remainder function. This is in disagreement with a number of languages - Chez Scheme and Haskell all behave as in the expected behavior above. This issue is also present in the original Idris 1.I'm moderately worried that just straight up swapping the definition may result in some problems in the core, but presumably if appropriate care is taken to rename all the usages of
mod
torem
as described above (and as used by Haskell), everything should work fine.