FOR start end bump behavior inconsistent, doesn't make sense

rebolbot commented 11 years ago

Submitted by: BrianH

The behavior of FOR with different start, end and bump values isn't consistent between start > end, start = end, and start < end, and doesn't necessarily make sense in any of those cases. We need a clean and consistent model for its behavior with those parameters, especially when start-vs-end and bump conflict.

The main thing that these 3 parameters decide is the direction of advancement, and for that matter to what extent the loop happens at all. And the main potential conflict is that both the bump and the relative values of start and end could decide this, but we need to decide which gets precedence.

If we decide that the bump is the primary factor for deciding advancement, then a positive bump would mean going forward, a negative bump should mean going back, and a zero bump (+0.0 and -0.0 too) should trigger an error because it would be by definition undefined behavior. The start and end parameters would only be paid attention to after the bump is considered, having the loop not happen at all of their relative direction is the opposite of the sign of the bump.

If, on the other hand, we consider the relative values or indexes of the start and end positions to be the primary determinant of the direction, start < end would mean going forward, start > end would mean going backward, and start = end would mean not advancing at all (looping once and stopping). After that direction is set, the bump would be considered simply to set the velocity of the loop in that direction. If the direction is forward, bump > 0 would go forward, and bump <= 0 would not loop at all. If the direction is backward, bump < 0 would go backward, and bump >= 0 would not loop at all. For the start = end case, the bump would be ignored. No errors triggered.

Either of those models would make sense, and in practice the only difference between them is that bump=0 would trigger an error for the bump first model, while bump=0 would just do nothing for the start-vs-end first model.

Which should we choose?

^{CC - Data [ Version: r3 master Type: Bug Platform: All Category: Native Reproduce: Always Fixed-in:none ]}

rebolbot commented 11 years ago

Submitted by: BrianH

I'm leaning towards the start-vs-end-first model, because it makes just as much sense as the other model and it triggers fewer errors. If a good enough argument can be made that a set of behavior isn't an error, triggering an error seems rude. What do you think?

rebolbot commented 11 years ago

Submitted by: Gregg

Ed's Note: Thread shortened for clarity see full archived version at Internet Archive

The two models you laid out are so close that I don't have a strong preference. My main concern is that a bump of zero doesn't lead to infinite loops. Between the two, I would choose the latter (start-end model), because of your comparison here. If a bump of zero means an infinite loop instead, I vote for the bump-error model.

rebolbot commented 11 years ago

Submitted by: Ladislav "But FOR seems just too ugly" - yes, I often felt that way

rebolbot commented 11 years ago

Submitted by: fork I will add that the sooner we get started on R3/Backward, the sooner we can worry less about compatibility issues, as if we are to be realistic at all we must realize that R2 code just won't run in R3. I will take the idea of being nervous about improving FOR seriously as soon as SHIFT reverts to the R2 interpretation of sign.

(I don't think it should, I'm just sayin'...point of no return already reached, so let's learn how to manage it elegantly.)

So make an R3/Backward implementation of FOR that does whatever you decide here, and replace FOR with something more like Ladislav's "CFOR" (if not exactly Ladislav's CFOR). How's that?

http://curecode.org/rebol3/ticket.rsp?id=884

rebolbot commented 11 years ago

Submitted by: BrianH Ladislav, thanks for writing all of that code in AltME, it will help with the tests.

As Fork says, this might be a matter for R3/Backward and rebol-patches, but it still needs doing.

rebolbot commented 11 years ago

Submitted by: Gregg Ladislav, thanks for writing things so clearly, with examples and explanations for each. It's very helpful.

rebolbot commented 11 years ago

Submitted by: BrianH Semantically, what is the difference? How are your tests any different from the behavior of the start-vs-end primacy model? If you have to explain in code, go ahead. The important thing is how it will behave differently.

The model is only important in being an explanation of why something behaves the way it does, the meaning of the behavior. Simply saying what it does is not an explanation, it's a description. Your code is a great description of what behavior to expect, in simple and extensive enough terms that it can be translated to unit test vectors, and that is definitely needed. However, an explanation that can't be stated in two sentences or less needs rethinking. I had to actually go through the code and compare test cases to get an understanding of your model, because your explanation didn't pass the doesn't-know-Rebol-newbie-programmer test. If you need to be a CS major to understand a model then that model won't work, even if it accurately results in the behavior we want.

If there is any behavioral difference between your above tests and what I was describing as being the resultant behavior of the start-vs-end setting the direction, bump setting the velocity model I stated in the ticket, let us know. Otherwise, we can assume that the behavior in your comment above is what you want.

rebolbot commented 11 years ago

Submitted by: BrianH I'm sorry, it was an English problem, my bad. "Primacy" didn't mean what you thought it meant. It doesn't mean that start-vs-end would be more important than bump (or vice-versa), it meant that it would be processed first. The second one would be just as important, but it would be more important later on in the process. Your stuff was talking about termination tests, which are just as important, but processed later.

Also, you keep using the term "your proposal" in the singular sense. There were two proposals, both for high-level philosophical models to explain the reason why we would choose one of two different sets of behavior. The set of behavior you are advocating is in keeping with one of those models, the one that explains why we don't trigger an error when bump is 0; it's the other model that explains why bump being 0 would trigger an error. We would pick between the two sets of behavior for practical reasons, and then use the model to pretend that we did so for philosophical reasons.

Once you start getting into termination conditions the loop has already gotten well past the point where the difference between these two models matters at all. That is why your termination conditions description has to be so involved: It is using termination conditions to try to explain choices made before termination is even a factor. Sorry for the confusion.

rebolbot commented 11 years ago

Submitted by: Ladislav "I'm sorry, it was an English problem, my bad. "Primacy" didn't mean what you thought it meant. It doesn't mean that start-vs-end would be more important than bump (or vice-versa), it meant that it would be processed first. The second one would be just as important, but it would be more important later on in the process. Your stuff was talking about termination tests, which are just as important, but processed later. " - I am sorry, but this still misses big. The point is that I do not mind what words you use. I do mind what the meaning is. Therefore, I do not mind whether you write to give "primacy" (you wrote that) or "precedence" (you wrote that as well) or "process first" (again, you wrote that). What I mind about is that you don't handle the factors equally, which is obvious no matter which words you use to describe it.

rebolbot commented 11 years ago

Submitted by: Ladislav Regarding the proposal. This came from the discussion at GG Rebol mailing list and it is not giving precedence to either START-END or BUMP:

First of all, it is useful to terminate before evaluating the cycle body in some cases.

To be able to terminate before evaluating the cycle body we need a test to be applied just before the cycle body is evaluated.

So, we need to have a cycle test to compare VALUE (the value of the cycle variable) with END (the cycle argument) and evaluate the body when the test is TRUE, terminating the cycle otherwise.

* START and END parameters can be used to determine the cycle test in the following way:

** if START <= END and VALUE is the value of the cycle variable, the test should look as follows:

VALUE <= END

** if START >= END the test should look as follows:

VALUE >= END

** the above two cases combined imply that if START = END the test should look as follows:

VALUE = END

* the BUMP value can be used to determine the cycle test as well

** if BUMP >= 0 the test should look as follows:

VALUE <= END

** if BUMP <= 0 the test should look as follows:

VALUE >= END

** the above two combined imply that if BUMP = 0 the test should look as follows:

VALUE = END

* since we have two ways how to determine the cycle test we need to resolve the conflict

** the best way is to use the conjunction of both tests, putting both "test sources" on equal footing

Examples:

for i 1 2 1

- both methods yield the same VALUE <= END test. The conjunction yields the VALUE <= END cycle test.

for i 2 1 -1

- both methods yield the same VALUE >= END test. The conjunction yields the VALUE >= END cycle test.

for i 2 1 1

- the START-END method yields the VALUE >= END test while the BUMP method yields the VALUE <= END test. The conjunction yields the VALUE = END cycle test. This test is already FALSE for START causing the cycle to not evaluate the body.

for i 1 2 -1

- the START-END method yields the VALUE <= END test while the BUMP method yields the VALUE >= END test. The conjunction yields the VALUE = END test. This test is already FALSE for START causing the cycle to not evaluate the body.

for i 1 2 0

- the START-END method yields the VALUE <= END test while the BUMP method yields the VALUE = END test. The conjunction yields the VALUE = END test. The test is already FALSE for START causing the cycle to not evaluate the body.

for i 2 1 0

- the START-END method yields the VALUE >= END test while the BUMP method yields the VALUE = END test. The conjunction yields the VALUE = END test. The test is already FALSE for START causing the cycle to not evaluate the body.

for i 1 1 1

- the START-END method yields the VALUE = END test while the BUMP method yields the VALUE <= END test. The conjunction yields the VALUE = END cycle test.

for i 1 1 -1

- the START-END method yields the VALUE = END test while the BUMP method yields the VALUE >= END test. The conjunction yields the VALUE = END cycle test.

for i 1 1 0

- both the START-END method as well as the BUMP method yield the VALUE = END test. The conjunction yields the VALUE = END test.

Note that in this case the test obtained would cause the cycle to become infinite, though. If wanting the cycle to not become infinite, there is no other way than to use some other "arbitrary" test. Due to the fact that we need termination and the value of the cycle variable is assumed to never change, the arbitrary test has to fail at the start causing the cycle to not evaluate the body.

Note: the above observations don't account for possible arithmetic overflow cases. Those issues need a separate consideration.

rebolbot commented 11 years ago

Submitted by: Ladislav "Semantically, what is the difference?"

the difference is that the proposal I wrote does not use arbitrary decisions like "primacy", "precedence", "should trigger an error", "looping once and stopping"
the main "vehicle" of my proposal is the cycle test, which needs to be determined.
In a manner compatible with your proposal it is shown how the cycle test could be constructed using just the START-END arguments
Again in a manner compatible with your proposal it is shown how the cycle test could be constructed using just the BUMP argument
Having two (possibly incompatible) sources of the cycle test I put both ways on equal footing (no "primacy" or "precedence") stating that the actual cycle test shall be the conjunction of both particular tests. This puts both cycle test sources on exactly equal footing since conjunction does not give precedence to any of the factors. Also it eliminates any "arbitrariness" or "explanations" why the precedence, primacy or the "processing priority" was (or should have been) given to one factor.
My proposal (just) determines the cycle test which happens to be FALSE in some specific cases (as demonstrated on the illustrative set of examples) for the START value explaining why the loop does not actually evaluate the cycle body at all.
Also, since just the cycle test is obtained, I can determine when the cycle body isn't evaluated (it is when the START value cycle test already yields FALSE), but I cannot (nor I want to) state things like "looping once" since that is what we cannot know in advance because it depends on the value of the cycle variable which can be changed in the loop body and is tested only after the body was evaluated.

I hope this makes it clear where the differences are.

rebolbot commented 11 years ago

Submitted by: BrianH

It came out in AltME that when you get past all of the argument over verbiage, there is one actual semantic difference between Ladislav's proposal and my two proposals.

My two proposals were both intended to make absolutely sure that no combination of start, end and bump would by themselves end up with FOR doing an infinite loop. The main purpose of this is to allow the developer to remove expensive conditional code that would be needed to screen start, end and bump in combination to make sure that such loops don't happen by accident. The fact that FOREVER exists makes it unnecessary to use FOR for your infinite loops, and if an infinite loop is your intention then using FOREVER makes that more clear so this would increase code clarity as well. This would allow us to take advantage of the constrained usage model of FOR to add an additional constraint to benefit the developer, since screening for infinite loops in FOR's native code would be much less expensive. And it doesn't really prevent the developer from making infinite loops intentionally using FOREVER or even changing the index word in the body block.

In Ladislav's proposal, he wants to make infinite loops with FOR. He doesn't see the need to screen for them, and AFAICT it is because he genuinely doesn't make the kinds of mistakes that regular developers who would benefit from this kind of screening make. No offence is meant by that, it's kind of amazing to see his code. Nonetheless, that is what "handle the factors equally" meant: he insisted that since the body wasn't screened, the other parameters shouldn't be screened either.

Ed's note: Embedded essay migrated to Rebol's Target Market: Newbies, Experts, or Other. For verbatim CureCode ticket, see Internet Archive

Now in this particular case, I would recommend that we implement the start-vs-end-sets-direction bump-sets-velocity velocity-must-advance model in rebol-patches and R3/Backward, because this is an R2 function and we should therefore aim it at the R2 market - which is admittedly now just R2 fans who never really used this function much and just need it to have the same number of parameters in the same order serving the same roles. Hopefully with a rewrite and some useful constraints against infinite loops they might start using it.

For R3 and R2/Forward, I recommend implementing #884 and calling it FOR. No, really, it's a better fit for R3's target market. R2's FOR has some useful constraints, but it isn't itself as useful as the #884 function. #884 takes 4 code blocks, not non-code values, and has "General loop" right at the beginning of its main doc string, so it's clearly a power user function that isn't meant to be constrained. That's enough of a "Here there be dragons!" warning that we can assume it wouldn't be called by someone without a sword and shield handy.

rebolbot commented 11 years ago

Submitted by: BrianH I had a tough time reading your arguments, because they seemed to be focusing on stuff that wasn't at all relevant to the actual semantics involved. Treating all of the arguments equally? Do you think the bump parameter cares what I think about it? This is code, not people.

The infinite loop thing was the only actual semantic difference between your proposals and my proposals afaict. I specifically crafted my proposals to exclude infinite loops - the only difference was what to do instead. Excluding infinite loops was a feature. That was a feature you disagreed with, fine.

But going on about having the parameters treated equally even when they aren't actually equal, and then saying that the decision to treat them differently was wrong and arbitrary, that needed an answer. It was a design choice, one which we have been applying to the R3 project for 5 years now. If you didn't understand that design choice, I have helpfully explained the sensible rationale for it above.

rebolbot commented 11 years ago

Submitted by: Ladislav "The model is only important in being an explanation of why something behaves the way it does, the meaning of the behavior. Simply saying what it does is not an explanation, it's a description. Your code is a great description of what behavior to expect, in simple and extensive enough terms that it can be translated to unit test vectors, and that is definitely needed. However, an explanation that can't be stated in two sentences or less needs rethinking. I had to actually go through the code and compare test cases to get an understanding of your model, because your explanation didn't pass the doesn't-know-Rebol-newbie-programmer test. If you need to be a CS major to understand a model then that model won't work, even if it accurately results in the behavior we want." - this deserves a note. My text describing how FOR should work is not any "explanation" or whatnot. It is:

a complete specification of the behaviour, i.e., it precisely specifies how the cycle has to behave (contrasting that to Brian's proposal which simply does not describe the behaviour completely enough to know what happens when a cycle variable changes in the cycle body)
a proof of the concept demonstrating how the behaviour can be derived from the basic principles mentioned

rebolbot commented 11 years ago

Submitted by: Ladislav "I had a tough time reading your arguments, because they were focusing on stuff that wasn't at all relevant to the actual semantics involved" - LOL. It is the other way around. My specification not just concentrates on the semantics involved, it actually does so in a complete manner specifying the semantics completely without needing to add some precisations later

rebolbot commented 11 years ago

Submitted by: Ladislav "Treating all of the arguments equally? Do you think the bump parameter cares what I think about it? This is code, not people. " - I am not sure this makes sense to discuss at all, but trying to just inform the uninformed:

I never said I treated the arguments equally. What I did treat equally was the sources of cycle tests allowing me to not give precedence to some of the sources arbitrarily suppressing the role of the other when defining the cycle test needed to specify the behaviour completely
"Do you think the bump parameter cares what I think about it?" - Do you think that the programmer does not care what the implementer of the loop thinks about the BUMP parameter?

Also, when speaking about the cycle tests:

specifying/knowing the cycle test used allows me to completely specify/know the behaviour of the cycle
treating all possible sources of the cycle tests equally allows me to use every available bit of information when constructing the cycle test

rebolbot commented 11 years ago

Submitted by: Ladislav Also, this looks worth examining: "I specifically crafted my proposals to exclude infinite loops " - hmm, did we already decide what to do in cases like:

for i 1.0 1e30 1.0 [...]

, which looks like an infinite loop to me.

rebolbot commented 11 years ago

Submitted by: BrianH It looks like have been arguing past each other rather than from an understanding of the real differences between our proposals, or of the purpose of this ticket. Let me help things by trying to explain my proposal better. I posted tests that implement the main goal of the proposal in AltME and explained them there too, but just in case that isn't enough here are the 5 steps that matter for the purposes of this discussion:

1) When the function first starts, it checks whether start < end, start > end, or start = end, and then picks one of 3 sets of termination conditions the loop will use. Each set contains starting and post-cycle termination conditions. 2) The starting condition only pays attention to the values start, end and bump, and its main goal is to make sure that bump > 0 when start < end, that bump < 0 when start > end, or that start = end, and if any of those are not true then the loop doesn't start even a single cycle. For the starting condition you don't need to check bump at all if start = end, it only matters when start != end. 3) If you get this far, run one cycle of the loop by doing body. If the result of that is a break unwind, stop and return the associated value (default unset). If it's a continue unwind, go to 4. If it's another kind of unwind, stop and return it. (Replace the "unwind" stuff with the R2 equivalent in R2.) 4) Having finished a cycle (if you get this far), check the post-cycle termination condition chosen in step 1. The post-cycle condition only pays attention to start, end, and the value assigned to word at the end of the cycle (:word); it ignores bump. The :word value might have changed since the start of the cycle, so don't assume it's the same and don't revert it. If start = end then terminate if :word = end. If start < end then terminate if :word >= end. If start > end then terminate if :word <= end. If you terminate, return the result of the body evaluation. 5) If you haven't terminated, add bump to :word and go back to step 3. Note that 1 and 2 are irrelevant at this point

Now, as for the actual subject of this ticket, it is step 2 in that list, the starting condition. Of my two proposals, one has it trigger an error when bump = 0 and start != end, and one has the loop just not do anything and return none. I like the second of those choices, because you could plausibly argue that we have decided that a bump of 0 is just out of range when start != end. That will allow the developer cut down on conditional code to avoid errors, which is the whole reason for defining 0 to be out of range in the first place so it would be counterproductive to require them to add back the conditional code for another reason.

If you are trying to make an argument that we should skip that starting condition check altogether then you will not convince me, because adding that starting condition is the entire purpose for me writing the ticket in the first place.

If you say that the model is too complex then I will ask you to make a simple model that accomplishes the goal of adding that starting condition, preferably one that also doesn't consider the starting condition again at any point after the cycles start the way the above does. If you say it isn't formulated correctly then it will be on you to reformulate it in a way that explains semantics that meet the same goal of adding that starting condition, because I don't care about form. If there is a problem in anything other than the starting condition, like say the post-cycle condition, let me know.

If you say that the starting condition isn't affected by changes to the value of :word in the body code block then I will point you to step 4, which says that :word can change; and my above message, which says that stuff that happens in a code block is considered to be intentional and the developer's responsibility, not mine; and to where in the process the starting condition is even considered, specifically before those changes could possibly occur because the body block hasn't run yet.

If you say that R2-style FOR sucks, then will point out that it is off topic for this ticket, and redirect you to #884 where I agree with you at length and propose that #884 replace FOR completely for R3 and R2/Forward code, but then tell you that we will still need to add the starting condition to the FOR in R3/Backward and rebol-patches.

I hope that is more clear.

rebolbot commented 11 years ago

Submitted by: Ladislav "For the starting condition you don't need to check bump at all if start = end" - it turns out then that I find your proposal to check BUMP first determining if it is zero more consistent.

rebolbot commented 11 years ago

Submitted by: BrianH Cool. The starting condition thing was the main point. And I didn't even consider the existing starting conditions which I didn't think were a problem, like if start and end are both series (with direction being set by the relative index positions) then they should be references to the same underlying series data.

If it isn't supported already, and it isn't too confusing, I think that we might consider allowing the case of start referring to a series and end referring to a number, where the number would be either interpreted as an index (one-based relative from the head of the series) or as an offset (zero-based relating to the start position); we'd have to choose one of those two and only support that one, so I'd prefer offsets but would go with whichever. MOVE supports both, but it has a /to refinement to make that choice, and it just doesn't make sense to add a refinement to FOR since it never had to process one before.

The initial proposal pretty much assumed that the post-cycle termination condition would be something that made sense - well, one of a set of 3 termination conditions depending on the direction, but they would all make sense. If the post-cycle condition needs work, feel free to chime in.

rebolbot commented 11 years ago

Submitted by: Ladislav I would like to demonstrate one bug in the way how would the FOR cycle behave if using the rules you summarized above:

for i 1 1 1 [print i i: i + 1]

would be an infinite cycle printing 1 3 5 7 9 ...

That is inconsistent with

for i 1 2 1 [print i i: i + 1]

, which would print 1 and terminate. The problem is that comparing the two cycles having the same body nobody would expect the latter to "terminate sooner".

rebolbot commented 11 years ago

Submitted by: Ladislav "if start and end are both series (with direction being set by the relative index positions) then they should be references to the same underlying series data" - looks reasonable and works this way in R2 but not in R3.

rebolbot commented 11 years ago

Submitted by: Ladislav "we'd have to choose one of those two and only support that one, so I'd prefer offsets but would go with whichever" I prefer to use a poll for this, however, there already is a certain way Carl chose, and I am not sure it is good to decline.

rebolbot commented 11 years ago

Submitted by: BrianH (In reply to comment 3680)

Should the start=end post-cycle termination condition (in my above model) just be to terminate, regardless of what :word and end are? That would deal with the potentially dangerous inconsistency you mentioned in comment 3680, and would be more consistent with start=end just ignoring bump in the starting condition. That would be in keeping with the theme of trying to help the developer avoid unintended infinite loops, and would make the just-one-cycle definition of start=end more consistent too.

The relative inequalities of the start != end cases do a better job of protecting developers than the equality test that the start = end case currently has as a post-cycle condition.

(Comment 3682) "I prefer to use a poll for this, but we may even get no answer" - that means it's too iffy to add as a feature to an R2-like function. If there is no obvious answer to that question, any choice we make would be a support nightmare. Let's just skip it.

(Comment 3681) Assuming that by "but not in R3" you don't mean #884, that might need a ticket for that problem. It's a regression.

(Comment 3671) Assume that I don't understand your argument here. Explain how you would tweak the post-cycle condition of my model to deal with this, please?

rebolbot commented 11 years ago

Submitted by: Ladislav "Explain how you would tweak the post-cycle condition of my model to deal with this" - the issue is as follows: Rebol decimals have limited precision and some numbers don't increase when we add a positive (but relatively small) BUMP to them. Thus, increasing may stop at some moment even when we are adding a positive number. I am not sure FOR should handle such cases in some special way, though, because every exception handled increases the overhead. In this case I would suggest to try to just ignore the issue, what do you think?

rebolbot commented 11 years ago

Submitted by: BrianH Given that it isn't really FOR's fault, I agree, we probably can ignore this whole class of issues. But if you want the starting condition to be something like (bump + end) > end rather than bump > 0 in this case (and (bump - end) < end in the descending case), that could work too, assuming overflow is handled. Or whatever math ensures the (bump advances in the appropriate direction if start != end) starting constraint. Only fix it if it's pragmatic to do so.

hostilefork commented 6 years ago

Resolved in Ren-C according to BrianH's proposed mechanism.

https://github.com/metaeducation/ren-c/commit/46abbf6fc460269fccb963bf58cebad0b0659c78

metaeducation / rebol-issues

FOR start end bump behavior inconsistent, doesn't make sense #1993