modula3 / cm3

Critical Mass Modula-3
http://modula3.github.io/cm3/
Other
138 stars 25 forks source link

The definition of the FOR loop is prone to integer overflow #1027

Open rillig opened 2 years ago

rillig commented 2 years ago

The Modula-3 FOR statement is defined such that using it with very large or very small integers can result in unexpected overflow errors, which could terminate the program. This makes the FOR loop unnecessarily hard to use in safe and reliable programs.

Details: modula-3-for-loop.pdf

thielema commented 2 years ago

On Sun, 22 May 2022, Roland Illig wrote:

The Modula-3 FOR statement is defined such that using it with very large or very small integers can result in unexpected overflow errors, which could terminate the program. This makes the FOR loop unnecessarily hard to use in safe and reliable programs.

It would be easier if the program could examine the CPU's overflow flag, right? Unfortunately, C does not provide access to it.

rillig commented 2 years ago

It would be easier if the program could examine the CPU's overflow flag, right? Unfortunately, C does not provide access to it.

Yes. If the target machine provides access to the overflow flag, the loop would exit as soon as an overflow happens.

I didn't want to rely on that though, that's why I chose to do the overflow check using as few arithmetic primitives as possible.

norayr commented 2 years ago

i am sorry that i ask without knowing, but does the compiler check (at compile time of course) if the variable used for the looping potentially can overflow?

let's say we have i: INTEGER; l: LONGINT; would this generate a compile time error?

FOR i := 0 to l DO
...
END
RodneyBates commented 2 years ago

On 5/23/22 07:37, Norayr Chilingarian wrote:

i am sorry that i ask without knowing, but does the compiler check (at compile time of course) if the variable used for the looping potentially can overflow?

let's say we have i: INTEGER; l: LONGINT;

This ..............^ declaration of i is irrelevant, because the FOR loop is an inner scope and declares its own meaning of i.

would this generate a compile time error?

|FOR i := 0 to l DO| |Yes, but for a reason possibly different from what you may mean.

This is statically illegal because first and last must have the same base type (which is the type this inner i will be given.) INTEGER and LONGINT are distinct base types and are not mutually assignable, to prevent a cascade of questions about the sizes of intermediate arithmetic and results in expressions.

There are no implicit type conversions in Modula-3, although assignability provides something roughly equivalent in specific cases.

This is unlike CHAR and WIDECHAR, which, though different base types, are mutually assignable without trouble, because there are no arithmetic operations producing results of either type.

But you could write FOR i := 0L TO l DO ...

and the only overflow questions would be when Last is at one of the very ends of LONGINT, as discussed in Roland Illig's article referenced in his recent post.

See 2.3.16, 2nd paragraph, not counting the line of sample code.

|

|... END |

— Reply to this email directly, view it on GitHub https://github.com/modula3/cm3/issues/1027#issuecomment-1134622763, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABSVZNBALWPAWALZMDI3DETVLN3YVANCNFSM5WTZAWUA. You are receiving this because you are subscribed to this thread.Message ID: @.***>

norayr commented 2 years ago

thank you for explanation! i got more incentives to build cm3 and read modula-3 books!

RodneyBates commented 2 years ago

I like this idea.  When I can find some time, I plan to implement this rewrite in the front end.

However, I think I will keep the infinite loop behavior when step=0. Although it seems quite far-fetched, there is a risk of undermining some peculiar existing code that depends on this.  Many is the time I thought some coding pattern was too bizarre to ever actually exist in real-world code (not counting test cases, hack attempts, etc.), only to be proven wrong.

I always wrestle with how much optimization to do in the front end. But I plan to do the optimizations of cases where some subset of the loop parameters are constant in the front end.  I imagine llvm folks, at least, would say "leave that to the back end", but there are four back ends, some with many optimization options, some of which sometimes one needs to disable for various unrelated reasons.

In contrast, I see The ends-of-range changes as pretty much pure improvements.

On 5/22/22 14:02, Roland Illig wrote:

The Modula-3 FOR statement is defined such that using it with very large or very small integers can result in unexpected overflow errors, which could terminate the program. This makes the FOR loop unnecessarily hard to use in safe and reliable programs.

Details: modula-3-for-loop.pdf https://github.com/modula3/cm3/files/8750080/modula-3-for-loop.pdf

— Reply to this email directly, view it on GitHub https://github.com/modula3/cm3/issues/1027, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABSVZNEIJLKHLL5OQTASQ2TVLKAFRANCNFSM5WTZAWUA. You are receiving this because you are subscribed to this thread.Message ID: @.***>

RodneyBates commented 2 years ago

Time for some language lawyering before implementing stuff?

Here is the text of FOR

[image: Modula3FOR.png]

The specification is a bit vague, isn't it?

I think your behavior of an infinite loop is correct for step = 0. If we read the text above,

"the statement steps id through the values first, first + 0, first + 2*0, ... stopping when the value of id passes last"

ok, it never passes last and there is no under- or overflow. I think the interpretation ought to be pretty unambiguous: the program loops.

However, the case for values "near" FIRST(T) or LAST(T) is more difficult.

Take

FOR id := LAST(INTEGER)-1 TO LAST(INTEGER) BY 2 DO S END

we can use substitution to interpret what this means

"the statement steps id through the values LAST(INTEGER)-1, LAST(INTEGER)-1+2, ... stopping when the value of id passes last."

There are two questions here:

Q1. what is the meaning of LAST(INTEGER)-1+2 ?

Q2. what is the meaning of "id passes last"?

Glancing at later pages in the report, say here:

[image: Modula3Arithmetic.png]

I think we can answer Q1. unambiguously.

A1. "The effect of an operation that overflows... is a checked runtime error"

Yes we know there's an implementation restriction in many if not all extant Modula-3 compilers, which is that there is no such check in the compiler.

But if you will "fix" the semantics of the FOR, I firmly believe you should fix it in line with the language spec. Which MIGHT mean a dynamic runtime error check. If you want to justify something else, read below...

In particular, the following code will certainly NOT have a checked runtime error, or any other kind of strange behavior, because the body of the loop never terminates:

FOR id := LAST(INTEGER)-1 TO LAST(INTEGER) BY 2 DO Process.Exit(...) END

I can answer Q2 as well, even though "passes last" is not specifically defined in the Green Book.

A2. It is clear from the text and the spirit of the Green Book that "INTEGERs" in Modula-3 are intended to behave as a subrange of Z, Peano numbers or whatever you want to call them (grade-school integers). They are specifically NOT intended to be MOD 2^N or 10^N for any N or any other kind of "advanced/machine-oriented" concept. They are just a subrange of integers, implemented by computer.

I think from A1 and A2 we can conclude that a program that loops on

FOR id := LAST(INTEGER)-1 TO LAST(INTEGER) BY 2 DO S END

(where S itself does not loop) has definitely not been compiled correctly. Any implementation that would say that "id never passes last because id is done mod 2^N" has failed the reasonableness test against the Green Book's notion of what an INTEGER is intended to be.

But the definition of FOR is operational. It says that the program will update id to the new value, then check it against last, then quit if it has passed last.

That means that in line with A1, the correct response is to attempt to update id as stated, and take a checked runtime error on the update of same.

At least that is my reading.

But A2 does say that the spirit of the book is to simulate a subrange of Z so if you want to extend the range so that "if we had an extra bit of integer then we could see that id has passed last" it's not TOO offensive. But it does conflict with the operational definition of FOR.

On the third hand I was taught to be very suspicious of operational definitions of anything. So OK, exit the loop without an error if you want to do that...

  Mika

P.S. Bonus question:

What is the intended semantics of

VAR lwm1 := Word.Minus(LAST(Word.T), 1); lw := LAST(Word.T); w2 : Word.T := 2; BEGIN FOR id := lwm1 TO lw BY w2 DO S END END

?

In a sense it's incorrect to try to use Word.T in a FOR loop since there's no order on Word.T, but I think (am pretty sure) the compiler will accept it. Is that something worth "fixing"?

(My own feelings here: 1. using Word.T in FOR ought to be a compile-time error. 2. if it is NOT a compile-time error and intentionally so, then the correct behavior is for the FOR to loop in this case. Which makes it a problem to distinguish from the behavior that started this whole discussion since Word.T is really "just INTEGER in different clothing.")

On Fri, May 27, 2022 at 7:41 AM Rodney Bates @.***> wrote:

I like this idea. When I can find some time, I plan to implement this rewrite in the front end.

However, I think I will keep the infinite loop behavior when step=0. Although it seems quite far-fetched, there is a risk of undermining some peculiar existing code that depends on this. Many is the time I thought some coding pattern was too bizarre to ever actually exist in real-world code (not counting test cases, hack attempts, etc.), only to be proven wrong.

I always wrestle with how much optimization to do in the front end. But I plan to do the optimizations of cases where some subset of the loop parameters are constant in the front end. I imagine llvm folks, at least, would say "leave that to the back end", but there are four back ends, some with many optimization options, some of which sometimes one needs to disable for various unrelated reasons.

In contrast, I see The ends-of-range changes as pretty much pure improvements.

On 5/22/22 14:02, Roland Illig wrote:

The Modula-3 FOR statement is defined such that using it with very large or very small integers can result in unexpected overflow errors, which could terminate the program. This makes the FOR loop unnecessarily hard to use in safe and reliable programs.

Details: modula-3-for-loop.pdf https://github.com/modula3/cm3/files/8750080/modula-3-for-loop.pdf

— Reply to this email directly, view it on GitHub https://github.com/modula3/cm3/issues/1027, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABSVZNEIJLKHLL5OQTASQ2TVLKAFRANCNFSM5WTZAWUA . You are receiving this because you are subscribed to this thread.Message ID: @.***>


M3devel mailing list @.*** https://m3lists.elegosoft.com/mailman/listinfo/m3devel

rillig commented 2 years ago

When I re-read the 1989 Modula-3 Report, it surprised me, as it defines the FOR loop for two distinct cases:

  1. If id is an integer, …
  2. The case in which id is an element of an enumeration …

In my original analysis, I had mixed up these two cases, assuming that the code rewriting definition would apply to the integer case as well. Sorry, my bad. As enumerations do not come anywhere near LAST(INTEGER), the code is fine as-is.

So we are left with this definition of the integer FOR loop:

  1. If id is an integer, the statement steps id through the values first, first+step, first+2*step, ..., stopping when the value of id passes last.
  2. S executes once for each value; if the sequence of values is empty, S never executes.
  3. The expressions first, last, and step are evaluated once, before the loop is entered.
  4. If step is negative, the loop iterates downward.

In the first sentence, the meaning of "passes" is unclear to me. When id reaches last, it doesn't yet pass it. And when id is modified afterwards, it also doesn't pass, it departs. To me, passing means to not stop in-between.

A1. "The effect of an operation that overflows... is a checked runtime error"

I didn't find these words in my copy of the language definition.

Neither did I find a definition of the type INTEGER that would restrict the numbers to an implementation-defined range. That's probably implied by the definition of FIRST(INTEGER).

The FOR statement is defined for integers and enumerations. Is the definition for subranges left out intentionally, or is it a sub-case of integer?

Regarding integer overflow: I had assumed that since the programmer does not directly influence the addition and multiplication, it should happen as if it would be performed in arbitrary precision. But that's just my intuition.

RodneyBates commented 2 years ago

On 5/27/22 10:39, Mika Nystrom wrote:

Time for some language lawyering before implementing stuff?

Here is the text of FOR

Modula3FOR.png

The specification is a bit vague, isn't it?

I think your behavior of an infinite loop is correct for step = 0.  If we read the text above,

"the statement steps id through the values first, first + 0, first + 2*0, ... stopping when the value of id passes last"

ok, it never passes last and there is no under- or overflow.  I think the interpretation ought to be pretty unambiguous: the program loops.

However, the case for values "near" FIRST(T) or LAST(T) is more difficult.

I can't think of any satisfactory way to reword the English narrative definition except by using Z arithmetic.  Something like:

"including only values of id that satisfy id + step <= (>=) last", and stating that the plusses (here and in the arithmetic progression above) and relations are in Z arithmetic, not Modula3 INTEGER arithmetic.

Take

FOR id := LAST(INTEGER)-1 TO LAST(INTEGER) BY 2 DO S END

we can use substitution to interpret what this means

"the statement steps id through the values LAST(INTEGER)-1, LAST(INTEGER)-1+2, ... stopping when the value of id passes last."

Roland's rewrite accomplishes the same, less concisely, but without going outside of INTEGER, thus overflows won't happen and are irrelevant.

There are two questions here:

Q1. what is the meaning of LAST(INTEGER)-1+2 ?

Q2. what is the meaning of "id passes last"?

Glancing at later pages in the report, say here:

Modula3Arithmetic.png

I think we can answer Q1. unambiguously.

A1. "The effect of an operation that overflows... is a checked runtime error"

Yes we know there's an implementation restriction in many if not all extant Modula-3 compilers, which is that there is no such check in the compiler.

But if you will "fix" the semantics of the FOR, I firmly believe you should fix it in line with the language spec. Which MIGHT mean a dynamic runtime error check.  If you want to justify something else, read below...

All of the values that id takes on when visible to the programmer are within INTEGER, so the ideal loop semantics should handle them all, without raising any kind of exception or error.

In particular, the following code will certainly NOT have a checked runtime error, or any other kind of strange behavior, because the body of the loop never terminates:

FOR id := LAST(INTEGER)-1 TO LAST(INTEGER) BY 2 DO Process.Exit(...) END

I can answer Q2 as well, even though "passes last" is not specifically defined in the Green Book.

A2. It is clear from the text and the spirit of the Green Book that "INTEGERs" in Modula-3 are intended to behave as a subrange of Z, Peano numbers or whatever you want to call them (grade-school integers).  They are specifically NOT intended to be MOD 2^N or 10^N for any N or any other kind of "advanced/machine-oriented" concept.  They are just a subrange of integers, implemented by computer.

Except ... when you use the operations in Word on them.  It's not a different type, it's different operations on the same type.

I think from A1 and A2 we can conclude that a program that loops on

FOR id := LAST(INTEGER)-1 TO LAST(INTEGER) BY 2 DO S END

(where S itself does not loop) has definitely not been compiled correctly.  Any implementation that would say that "id never passes last because id is done mod 2^N" has failed the reasonableness test against the Green Book's notion of what an INTEGER is intended to be.

But the definition of FOR is operational.  It says that the program will update id to the new value, then check it against last, then quit if it has passed last.

That means that in line with A1, the correct response is to attempt to update id as stated, and take a checked runtime error on the update of same.

At least that is my reading.

But A2 does say that the spirit of the book is to simulate a subrange of Z so if you want to extend the range so that "if we had an extra bit of integer then we could see that id has passed last" it's not TOO offensive.  But it does conflict with the operational definition of FOR.

On the third hand I was taught to be very suspicious of operational definitions of anything.  So OK, exit the loop without an error if you want to do that...

      Mika

P.S. Bonus question:

What is the intended semantics of

VAR   lwm1 := Word.Minus(LAST(Word.T), 1);   lw := LAST(Word.T);   w2 : Word.T := 2; BEGIN   FOR id := lwm1 TO lw BY w2 DO S END END

It causes some confusion that, while there are many operators like Word.Minus that apply unsigned interpretation to values, there are no counterparts to FIRST, LAST, [..], or FOR (did I mess any?) that do so.

Making the reasonable assumption that LAST(INTEGER) >= 2, lwm1, lw, and w2 each have the same value in either signed or unsigned interpretation.  They don't carry anything into the loop that would convey any distinction that would show they were computed using Word.

Then nothing in the FOR loop, by any definition we have discussed, applies unsigned interpretation to anything.  So Word.T is not really used therein, except in the weak sense that it happens to be an unmentioned synonym for INTEGER that is floating around. So this loop is exactly equivalent to the previous examples.

?

In a sense it's incorrect to try to use Word.T in a FOR loop since there's no order on Word.T, but I think (am pretty sure) the compiler will accept it.  Is that something worth "fixing"?

(My own feelings here: 1. using Word.T in FOR ought to be a compile-time error. 2. if it is NOT a compile-time error and intentionally so, then the correct behavior is for the FOR to loop in this case.  Which makes it a problem to distinguish from the behavior that started this whole discussion since Word.T is really "just INTEGER in different clothing.")

On Fri, May 27, 2022 at 7:41 AM Rodney Bates @.***> wrote:

I like this idea.  When I can find some time, I plan to implement this
rewrite in the front end.

However, I think I will keep the infinite loop behavior when step=0.
Although it seems quite far-fetched, there is a risk of undermining
some peculiar existing code that depends on this.  Many is the time
I thought some coding pattern was too bizarre to ever actually exist
in real-world code (not counting test cases, hack attempts, etc.),
only to be proven wrong.

I always wrestle with how much optimization to do in the front end.
But I plan to do the optimizations of cases where some subset of the
loop parameters are constant in the front end.  I imagine llvm folks,
at least, would say "leave that to the back end", but there are four
back ends, some with many optimization options, some of which sometimes
one needs to disable for various unrelated reasons.

In contrast, I see The ends-of-range changes as pretty much pure
improvements.

On 5/22/22 14:02, Roland Illig wrote:
The Modula-3 FOR statement is defined such that using it with very large or very small integers can result in unexpected overflow errors, which could terminate the program. This makes the FOR loop unnecessarily hard to use in safe and reliable programs.

Details: modula-3-for-loop.pdf <https://github.com/modula3/cm3/files/8750080/modula-3-for-loop.pdf>

—
Reply to this email directly, view it on GitHub <https://github.com/modula3/cm3/issues/1027>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ABSVZNEIJLKHLL5OQTASQ2TVLKAFRANCNFSM5WTZAWUA>.
You are receiving this because you are subscribed to this thread.Message ID: ***@***.***>
_______________________________________________
M3devel mailing list
***@***.***
https://m3lists.elegosoft.com/mailman/listinfo/m3devel