idris-lang / Idris2

A purely functional programming language with first class types
https://idris-lang.org/
Other
2.52k stars 373 forks source link

The modulo function incorrectly behaves as a remainder function. #1571

Closed SmiVan closed 2 years ago

SmiVan commented 3 years ago

Steps to Reproduce

Modulo:

map (\d => d `mod` 3) [-3..3] 

Remainder:

map (\d => d `rem` 3) [-3..3] 

Expected Behavior

Modulo:

[0, 1, 2, 0, 1, 2, 0]

Remainder:

[0, -2, -1, 0, 1, 2, 0]

Observed Behavior

Modulo:

[0, -2, -1, 0, 1, 2, 0]

Remainder:

Error: Undefined name rem. 

Commentary

The function for mod seems to be incorrectly implemented as the remainder function. This is in disagreement with a number of languages - Chez Scheme and Haskell all behave as in the expected behavior above. This issue is also present in the original Idris 1.

I'm moderately worried that just straight up swapping the definition may result in some problems in the core, but presumably if appropriate care is taken to rename all the usages of mod to rem as described above (and as used by Haskell), everything should work fine.

ohad commented 3 years ago

Tagging @madman-bob and @stefan-hoeck since they thought about it not too long ago.

SmiVan commented 3 years ago

Related examples from some languages:

Chez Scheme ### Modulo: ```scheme > (map (lambda (d) (mod d 3)) '(-3 -2 -1 0 1 2 3)) (0 1 2 0 1 2 0) ``` ### Remainder: ```scheme > (map (lambda (d) (remainder d 3)) '(-3 -2 -1 0 1 2 3)) (0 -2 -1 0 1 2 0) ```
Haskell ### Modulo: ```haskell Prelude> map (\d -> d `mod` 3) [-3..3] [0,1,2,0,1,2,0] ``` ### Remainder: ```haskell Prelude> map (\d -> d `rem` 3) [-3..3] [0,-2,-1,0,1,2,0] ```

Zig Run on Zig 0.9.0. My Zig is not the cleanest, but it suffices. ### Modulo: ```zig const std = @import("std"); pub fn main() anyerror!void { const nums = [7]i8{-3,-2,-1, 0, 1, 2, 3}; var nums_mod = [7]i8{ 0, 0, 0, 0, 0, 0, 0}; for (nums) |num, i| { nums_mod[i] = @mod(nums[i], 3); } std.log.info("{any}", .{nums_mod}); } ``` ```zig info: { 0, 1, 2, 0, 1, 2, 0 } ``` ### Remainder: ```zig const std = @import("std"); pub fn main() anyerror!void { const nums = [7]i8{-3,-2,-1, 0, 1, 2, 3}; var nums_rem = [7]i8{ 0, 0, 0, 0, 0, 0, 0}; for (nums) |num, i| { nums_rem[i] = @rem(nums[i], 3); } std.log.info("{any}", .{nums_rem}); } ``` ```zig info: { 0, -2, -1, 0, 1, 2, 0 } ```

The situation is made more convoluted by %, which in Perl, Python and Ruby refers to the modulo operation, but in C, D, F#, Java, JavaScript and Rust refers to the remainder operation, but since % is not an operation in Idris, we can just ignore that whole mess.

There's a more thorough list of different definitions of modulo operations on this wikipedia article, however I didn't hand-check these so I can't guarantee accuracy.

fabianhjr commented 3 years ago

Oh my, at the beginning I was like "oh this is an easy fix" and then I realized that there is an Ouroboros situation going on.

The Int/Integer primitives are compiled by the Idris2 compiler to Integer primitives of the previous release, so this will take a little more than just changing the definitions DX

stefan-hoeck commented 3 years ago

The RefC backend already behaves "correctly" (using Euclidian div and mod, see #1480). We'd like to have the same behavior for the other backends.

While I am still in the process of cleaning up the situation of integer primitives on JS and Scheme backends, I'm rather busy finishing a complete overhaul of the JS backend right now. Still, I might find some time to look into this over the next couple of days.

madman-bob commented 3 years ago

In Haskell, mod and rem correspond to floored and truncated modulus, respectively. In practise, however, what you almost always want is the Euclidean modulus. Depending on your use case, in Haskell you can usually get away with using one-or-the-other of the available moduluses (moduli?), but sometimes solutions can end up fragile.

The paper I linked in #1480 compares four types of modulus (including ceiling modulus, but no one uses that), and the Euclidean modulus clearly has the nicest properties.

That said, that doesn't mean we need to restrict ourselves to only the Euclidean modulus. However, I would want to see a compelling use-case for including some other modulus - a practical use, which would not be solved as easily using a Euclidean mod.

SmiVan commented 3 years ago

I'm not particularly knowledgeable on the different modulo types, all I've noticed is that this makes things confusing when trying to adapt algorithms from/to other languages.

I'm used to mod being floored modulo and rem being truncated modulo because this seems to be a trend in several languages I know. In fact - if the wikipedia article for the modulo operation is to be trusted, it appears a lot of languages have mod as floored and rem as truncated, and the usage of the euclidean modulo at all is somewhat uncommon.

For what it's worth though - I would use the truncated modulo (rem) in my converter of decimal to balanced ternary, as it closely follows the numeric pattern for balanced ternary, though I'm sure there would be some other way to accomplish the same if it was absent, probably with extra steps to flip the order of numbers for negative dividends and also passing the sign through.

And I've yet to run into a situation in practice where I'd need to distinguish between floored and euclidean modulo, so I have no opinion on this one, although skimming the Daan Leijen paper gives the impression that euclidean provides some optimization in some usage cases.

madman-bob commented 3 years ago

You're right that changes may mean you need to adapt algorithms, but that's not always a bad thing. With the right abstractions, algorithms can be simplified, making it easier to confirm correctness, and easier to adapt them again in the future.

While there is benefit in familiarity, it's not useful to stick with what everyone's always done just for the sake of it. Otherwise, there would never be any progress. Similarly, Idris doesn't have null, despite many mainstream languages having something like that.

That many languages include both mod and rem suggests to me that they haven't thought deeply about the choice, and are including them just to cover all bases.

Balanced ternary sounds like a good example. I've included a snippet below of how I would do it with Euclidean mod. I don't think it could be made much simpler with rem, though the + 1 is suspicious. How would you approach it?

Floored and Euclidean mod differ only when the second argument is negative, which tends to be uncommon, so I expect most people don't have an opinion on that one.

You're also right about the optimizations. As well as the mathematical properties, it plays nicely with the twos-complement representation of integers, allowing for some expressions to be rephrased as bit-ops.

Please note: Snippet requires Euclidean mod, which is currently only in the RefC backend.

data BalancedTernaryDigit = M | Z | P

BalancedTernary : Type
BalancedTernary = List BalancedTernaryDigit -- Least Significant Digit first

fromBTD : BalancedTernaryDigit -> Integer
fromBTD M = -1
fromBTD Z = 0
fromBTD P = 1

fromBT : BalancedTernary -> Integer
fromBT [] = 0
fromBT (x :: xs) = fromBTD x + 3 * fromBT xs

toBTD : Integer -> BalancedTernaryDigit
toBTD x = case x `mod` 3 of
    0 => Z
    1 => P
    _ => M

toBT : Integer -> BalancedTernary
toBT 0 = []
toBT x = toBTD x :: toBT ((x + 1) `div` 3)
SmiVan commented 3 years ago

Your snippet is actually a much cleaner implementation than my attempt (I can't compile RefC at all for some reason on my machine, so I can't verify for myself, but it appears correct on visual inspection), so I withdraw my comment regarding balanced ternary.

I suppose rem can be a bit of language bloat if it's not particularly commonly used, so from that standpoint I understand how it may be undesirable to have it among the standard functions. Leaving it and other variants of modulo to optional libraries will make sense then, unless anyone can come up with another practical scenario to use rem in.

So with this considered, this issue reduces to aligning the other codegens to use Euclidean mod (and div) for consistency between backends.