j3-fortran / fortran_proposals

Proposals for the Fortran Standard Committee
178 stars 15 forks source link

Conditional Expressions #183

Open certik opened 3 years ago

certik commented 3 years ago

Relevant papers:

Taking the second example from the 18-274 paper:

  IF (PRESENT(D)) THEN
    CALL SUB(A,B,C,D)
  ELSE IF (X<1) THEN
    CALL SUB(A,B,C,EPSILON(X))
  ELSE
    CALL SUB(A,B,C,SPACING(X))
  END IF

One proposed syntax is "keyword syntax":

CALL SUB(A, B, C, IF (PRESENT(D) THEN D ELSE IF (X < 1) THEN EPSILON(X) ELSE SPACING(X) END IF)

The second proposed syntax is "? syntax":

CALL SUB(A, B, C, ? (PRESENT(D) D :? (X < 1) EPSILON(X) : SPACING(X) ?)
certik commented 3 years ago

Which of these do people find the most readable and least readable?

My own feeling (from most readable to least readable):

  1. Original
  2. keyword syntax
  3. ? syntax

We should also consider more use cases. I also asked at https://fortran-lang.discourse.group/t/202x-feature-conditional-expressions/329 to get more feedback on this feature.

dev-zero commented 3 years ago

To be honest I find both variants suboptimal (the first one would make an if/else/endif block return a value when being used inline, and simply run code as usual when not, but would be easier to read) and I wonder whether there is really a need for general conditional expressions or whether a ternary operator would be sufficient. In the latter case I would suggest the Python-based syntax (which one could even nest):

CALL SUB(A, B, C, D IF PRESENT(D) ELSE (EPSILON(X) IF (X < 1) ELSE SPACING(X)))
certik commented 3 years ago

@dev-zero in your opinion, can you rate by readability / preference the three options above plus your proposed Python-based syntax?

The ternary operator (that you presumably like) is the "? syntax" which is one of the proposed ideas, but you also say they are "suboptimal", so I am confused.

milancurcic commented 3 years ago

From most to least readable:

  1. Original
  2. Python/@dev-zero syntax
  3. Keyword syntax
  4. ? syntax

Caveat: I use 2 in Python a lot so I'm used to it, thus my preference.

epagone commented 3 years ago

FWIW my preference from most to least readable is:

  1. Original
  2. keyword syntax
  3. ? syntax

Aside the reduced readability, IMHO I cannot see any practical advantage in the new proposed expressions.

dev-zero commented 3 years ago

My preference in terms of readability:

  1. Original
  2. Python-inspired syntax
  3. Keyword syntax
  4. ? syntax

As for why I dislike the ? syntax and the difference to the Python-inspired operator:

A general start and end marker ? for conditional expression would to me indicate that more complex conditional expressions should be possible at some point, with the only condition that all codepaths inside it have to return a compatible type, but that is with the current proposal not the case (nor should it be). On the other hand does the nesting as shown in the example above actually create (at least visually) an alias for ELSE IF: the :?, since the expression should actually be:

CALL SUB(A, B, C, ? (PRESENT(D)) D : ? (X < 1) EPSILON(X) : SPACING(X) ? ?)

correct? With the second to last ? omitted? Which I would find even less readable.

everythingfunctional commented 3 years ago

I think the readability question might slightly be missing the point. The question isn't about how easy is it to see what the code is doing, but how easy is it to see the code's intent.

To me, an if-else block says the intent of the code is to do different things based on some conditions. The new feature allows one to specify the intent "this variable (or argument) depends on some condition(s)" more clearly.

To use the example from above, if I see

  IF (PRESENT(D)) THEN
    CALL SUB(A,B,C,D)
  ELSE IF (X<1) THEN
    CALL SUB(A,B,C,EPSILON(X))
  ELSE
    CALL SUB(A,B,C,SPACING(X))
  END IF

It's not immediately obvious (especially in codes that are more complicated than this example) that we are definitely going to be calling sub in all cases, and the only difference is the value of the last argument. Whereas with

call SUB(A, B, C, IF (PRESENT(D)) THEN (D) ELSE IF (X < 1) THEN (EPSILON(X)) ELSE (SPACING(X) END IF)

that intent is put front and center without having to compare each branch in an entire if-else block.

So, my preferences would be for

  1. Keyword syntax - most similar to existing Fortran syntax
  2. Python syntax - still quite likely to be easily understood by new programmers
  3. ? syntax - similar to other languages, but not immediately obvious to new programmers
certik commented 3 years ago

If the idea is to call SUB just once, to make the intent obvious, as @everythingfunctional correctly points out, then here is the alternative to the original:

  IF (PRESENT(D)) THEN
    D2 = D
  ELSE IF (X<1) THEN
    D2 = EPSILON(X)
  ELSE
    D2 = SPACING(X)
  END IF
  CALL SUB(A,B,C,D2)

Which I personally find more readable than:

CALL SUB(A, B, C, IF (PRESENT(D) THEN D ELSE IF (X < 1) THEN EPSILON(X) ELSE SPACING(X) END IF)

But it is true you have to declare an extra variable (although the performance of the code should be identical with a good compiler).

milancurcic commented 3 years ago

Is a subset of this a good candidate for a stdlib function if_then_else(condition, expression1, expression2)?

everythingfunctional commented 3 years ago

Is a subset of this a good candidate for a stdlib function if_then_else(condition, expression1, expression2)?

It would be, but merge already would be sufficient, and the idea of the proposal is to not evaluate the unused expression (which merge can't do).

everythingfunctional commented 3 years ago

@certik , while you've removed the difficulty of seeing "do we always call SUB in every case", you've now introduced the ambiguity of "do we always assign to D2 in every case?" You're back to the exact same problem. You're using a multi-statement block to perform a single logical operation. A reader must inspect and evaluate every branch (the block as a whole) to understand it is performing a single logical operation.

certik commented 3 years ago

My understanding is that this feature was approved by WG5 for inclusion into 202X based on a survey, where people expressed a wish to have conditional expressions in Fortran, but I don't think there were concrete proposals how it would look like, just that the feature would be nice to have. @sblionel is that an accurate statement?

@everythingfunctional explained well the main argument for conditional expressions is that they enforce a single logical operation to be assigned somewhere (as the result of the conditional expression).

I agree that at this level of "requirements" it seems like a good idea and I am not against that (I use it sometimes in Python, although rarely; I never use the ? notation in C or C++). But when it comes down to syntax and how it would actually look like in Fortran, it seems less readable in practice to many people, and that is would should count. If a feature looks good in abstract terms, but does not look good in concrete (syntax) terms, in my opinion we should not put it in until it looks good both in abstract and concrete terms.

A prior compiler implementation of this would be very helpful, so that we can play with it more, before putting it into the language.

pbrady commented 3 years ago

Adopting ?: for a conditional expression but using a difference syntax from those languages which already use it for the same purpose (i.e. C, C++, Java, Javascript), is a bad idea and will only lead to confusion. As someone who uses multiple languages, I would appreciate it if Fortran did not do anything weird here and go off the beaten path for no apparent reason.

klausler commented 3 years ago

I forgot: why aren't we just fixing MERGE() to guarantee non-evaluation of the operand that isn't selected?

certik commented 3 years ago

@klausler the arguments that have been put forth against fixing merge are:

I don't personally understand the arguments, but I think @everythingfunctional does? Brad, can you summarize here why we cannot extend merge? If we could just extend merge, that would be the best way forward I think.

sblionel commented 3 years ago

My understanding is that this feature was approved by WG5 for inclusion into 202X based on a survey, where people expressed a wish to have conditional expressions in Fortran, but I don't think there were concrete proposals how it would look like, just that the feature would be nice to have. @sblionel is that an accurate statement?

Yes. The way we work is that WG5 outlines the general idea and J3 develops that into a specific proposal.

everythingfunctional commented 3 years ago

merge is elemental, and as such must evaluate both expressions to determine the shape of the result. The example is, if one argument to merge is a scalar and the other is an array, even if the scalar is the selected value, the array expression must still be evaluated to determine the shape of the result, because the result will still be an array with that shape, just with each value equal to the given scalar. The proposed conditional expressions would not be capable of that, because each expression must have the same rank.

everythingfunctional commented 3 years ago

@klausler , I actually proposed just that on the J3 discussion board, but Malcolm was able to give me convincing reasons against, along the lines of my previous comment.

certik commented 3 years ago

@everythingfunctional cannot the shape in merge be determined at compile time for both arguments?

klausler commented 3 years ago

But MERGE() could guarantee evaluation of at most one of its first two arguments in the case where the third argument is scalar and the first two arguments have the same rank, yes?

klausler commented 3 years ago

@everythingfunctional cannot the shape in merge be determined at compile time for both arguments?

Rank, yes, apart from assumed-rank dummy arguments, but not shape.

everythingfunctional commented 3 years ago

But MERGE() could guarantee evaluation of at most one of its first two arguments in the case where the third argument is scalar and the first two arguments have the same rank, yes?

I suppose that's true. In fact, one could go a small step further and state that the "unused" argument is evaluated only if necessary to determine the resultant shape. I.e., if the array argument is selected, the scalar argument wouldn't need to be evaluated.

However, that still wouldn't satisfy one of the use cases (although it's not one I find particularly compelling); conditionally supplying an optional argument. I think the proposed conditional expressions provide a clean way to provide that, with a convenient place to put the desired deferred evaluation functionality.

certik commented 3 years ago

conditionally supplying an optional argument

This use case should by done by #22, that seems like a cleaner solution anyway.

So extending merge and implementing #22, we might be able to cover this feature.

sgeard commented 3 years ago

I'd prefer to see something like

select
    case (present(d))
        call sub(a,b,c,d)
    case (i<n) then case (a(i)==0)
        call sub(a,b,c,epsilon(x))
    case default
        call sub(a,b,c,spacing(x))
end select

Or possibly use switch instead of select

everythingfunctional commented 3 years ago

22 is about providing a default value if an argument is not provided; it is about the callee. This proposal is about how an argument could be provided or not; it is about the caller.

For example, say if x < 0.1 I don't want to provide an argument to some procedure, how would merge be sufficient? What would you provide as the other argument? I.e.

call sub(a, merge(???, x, x < 0.1))
certik commented 3 years ago

@everythingfunctional how would that be written using conditional expressions? I don't think that's possible either, or am I missing something?

everythingfunctional commented 3 years ago

The proposal specifically allows for the else part to be omitted in cases of passing to optional arguments. So the example would be (in the case of the keyword syntax)

call sub(a, if (x >= 0.1) then (x))
certik commented 3 years ago

I see, I didn't realize that. Btw, I think the syntax is:

call sub(a, if (x >= 0.1) then (x) end if)

or

call sub(a, if (x >= 0.1) then x end if)

But adding parentheses around x actually makes it more readable to me.

Well, we can have some kind of syntax or keyword for merge, such as call sub(a, merge(*, x, x < 0.1)) or something like that, but I am not sure I like it.

klausler commented 3 years ago

The proposal specifically allows for the else part to be omitted in cases of passing to optional arguments. So the example would be (in the case of the keyword syntax)

call sub(a, if (x >= 0.1) then (x))

The syntax (x) is an idiomatic way to turn a variable into an expression. Will (x) above be a variable? If so, would one need ((x)) to pass a copy of x as an expression?

Maybe the concept of a conditionally-passed optional argument and a general conditional expression should be separated.

everythingfunctional commented 3 years ago

Oops, yeah I forgot the end if in my example.

I believe the parentheses are present in all of the examples in the syntax paper. I think they may be necessary to disambiguate in fixed-form syntax.

certik commented 3 years ago

Btw. this:

call sub(a, if (x >= 0.1) then (x) end if)

Doesn't look that bad compared to the original:

if (x >= 0.1) then
    call sub(a, x)
else
    call sub(a)
end if

I still slightly prefer the original in this case, but perhaps for some more complicated case I would sometimes use the conditional expression.

urbanjost commented 3 years ago

Did not have time to read all the comments so this might have been covered but the closest I can think of using current standard Fortran would be

      CALL SUB(A,B,C,MERGE(D, MERGE(EPSILON(X),SPACING(X),X<1.0), PRESENT(D)))

In a hopefully working example that would be

   program whatif
   implicit none
   call testit(0.0)
   call testit(1000.0,100.0)
   call testit(10.0)
   contains

   subroutine testit(x,d)
   real,intent(in)          :: x
   real,intent(in),optional :: d
   real                     :: a,b,c

      CALL SUB(A,B,C,MERGE(D, MERGE(EPSILON(X),SPACING(X),X<1.0), PRESENT(D)))

      write(*,*)a,b,c,merge(d,-99999.0,present(d))
   end subroutine testit

   end program whatif

   subroutine sub(aa,bb,cc,dd)
   real,intent(out) :: aa,bb,cc
   real,intent(in)  :: dd
      write(*,*)'SUB SEES=',dd
      aa=dd;bb=2*dd;cc=dd**2
   end subroutine sub

Results:

    SUB SEES=   1.19209290E-07
      1.19209290E-07   2.38418579E-07   1.42108547E-14  -99999.0000
    SUB SEES=   100.000000
      100.000000       200.000000       10000.0000       100.000000
    SUB SEES=   9.53674316E-07
      9.53674316E-07   1.90734863E-06   9.09494702E-13  -99999.0000

So using that as a comparison and looking at the limitations it has in the new syntax if the fourth parameter were metamorphic or if the routine was generic with different types allowed for D would all the expressions have to be of the same type or not?

Would everything always be evaluated or would it short-circuit? One of the issues with the above statement would be if D/2 instead of D were passed to SUB() and D was not present evaluating D/2 would be problematic, as MERGE is allowed to evaluate all the expressions.

That being said I would prefer something less C-like that short-ciruits with something closer to functional syntax like

call sub(a,b,c, if(present(d) <= d ;x<1.0 <=epsilon(x); spacing(x)) )

where the semi-colon is intentional and means to short-circuit and where it looks like a function called IF that has expressions of the form condition <= expression to use and the last field which has no condition would be the default and if there were no last expression it would be the same as not present. maybe a name of USE() instead of IF() would be good. The C syntax is one of my least-favorite syntaxs in any language.

so my preference would be 1) original 2) something other than either of the proposals

how about

call sub(a,b,c,IF(present(d) USE d; x<1 USE epsilon(x); spacing(x) ) )

in case the parameter does not have INTENT(IN) this could then be used like

myvar=IF(PRESENT(d) USE d; x<1 USE EPSILON(x); SPACING(x) )
call sub(a,b,c,myvar)
certik commented 3 years ago

I have implemented a prototype of this into LFortran: https://gitlab.com/lfortran/lfortran/-/merge_requests/645/

Example notebook: https://nbviewer.jupyter.org/urls/gitlab.com/lfortran/lfortran/-/raw/477cac615ca420377c7582f16294d05461bf7b8d/share/lfortran/nb/Conditional%20Expressions.ipynb

It doesn't implement the whole proposal, but it implements enough to be able to play with it.

certik commented 3 years ago

Two more syntax forms have been proposed.

Mixed form:

 IF ( logical-expression-1 ) ? expression-1
[ : ( logical-expression-k ) ? expression-k ] ...
: ( expression-n ) ENDIF

examples:

if (i>0 .and. i<=size(a)) ? a(i) : (present(val)) ? val : 0.0 endif
CALL SUB(A, B, C, IF (PRESENT(D) ? D : (X < 1) ? EPSILON(X) : SPACING(X) END IF)

Arrow form:

( predicate -> consequent
[ : predicate -> consequent ] ...
: alternative )

Examples:

( abs(residual)<=tol -> 'ok' : 'did not converge' )
( i>0 .and. i<=size(a) -> a(i) : present(val) -> val : 0.0 )
CALL SUB(A, B, C, ( PRESENT(D) -> D : (X < 1) -> EPSILON(X) : SPACING(X) ))
certik commented 3 years ago

Another suggestion is to amend the "arrow" form to simply use ? as in C (motivation: if the syntax is so close to C and C++, we should make is exactly the same, to prevent confusion):

C syntax:

Examples:

( abs(residual)<=tol ? 'ok' : 'did not converge' )
( i>0 .and. i<=size(a) ? a(i) : present(val) ? val : 0.0 )
CALL SUB(A, B, C, ( PRESENT(D) ? D : (X < 1) ? EPSILON(X) : SPACING(X) ))
certik commented 3 years ago

The committee discussed yesterday why merge will not work, so I want to summarize it here:

sgeard commented 3 years ago

If the sub-clauses in an if were allowed to return a value - a bit like a lambda function in C++ you could have

call sub(a, b, c, if (present(d)) then d; elseif (x > 0.1); x; else; endif)

Perhaps clearer with different brackets:

call sub(a, b, c, if {present(d)} then d; elseif {x > 0.1}; x; else; endif)

Readability seems to be a problem with all the options so the second is probably preferable.

14NGiestas commented 3 years ago

Did anyone considered the python's ternary syntax?

call sub(a, b, c, d if (present(d)) else (epsilon(x) if (x > 0.1) else spacing(x)))

dropping the parenthesis as they aren't needed in this case:

call sub(a, b, c, d if present(d) else epsilon(x) if x > 0.1 else spacing(x))
conditional-expr: consequent IF predicate ELSE (alternative | conditional-expr)
everythingfunctional commented 3 years ago

I pretty sure it has at least been mentioned, but I believe there was some resistance by compiler writers to syntax that requires look-ahead (or back-tracking) parsers. It's also not particularly "Fortranic".

14NGiestas commented 3 years ago

I pretty sure it has at least been mentioned, but I believe there was some resistance by compiler writers to syntax that requires look-ahead (or back-tracking) parsers. It's also not particularly "Fortranic".

It was mentioned here by @dev-zero, indeed, but I don't think it got the deserved attention. It's way more "Fortranic" than '?' operator (from C family) IMHO, without the boilerplate of the original proposal:

I can't say anything about the writers preferences, but I must say in a recent experience I've failed to write this kind of nested pattern in order to parse a simple key-value file xD.

<conditional-expr> ::= <then-expr> IF <scalar-logical-expr>  ELSE (<else-expr> / <conditional-expr>)
klausler commented 3 years ago

Is XIFLELSE(1.) a "conditional expression" involving variables X and L or is it a reference to an external function?

14NGiestas commented 3 years ago

Is XIFLELSE(1.) a "conditional expression" involving variables X and L or is it a reference to an external function?

So you are saying that the fortran tokenizer get rid of all whitespace... hmpf that would be a problem indeed :/ X IF L ELSE (1.) turns into XIFLELSE(1.) which is ambiguous (I think I figured out where I've failed in my VDF now xD)

everythingfunctional commented 3 years ago

Is XIFLELSE(1.) a "conditional expression" involving variables X and L or is it a reference to an external function?

In my opinion, with fixed-form source having been made obsolescent, at some point we should be allowed to stop worrying about how new features can expressed without significant whitespace. And so, XIFLELSE(1.) is a function call, and X IF L ELSE (1.) is a conditional expression. Unless there's some other reason I'm missing.

certik commented 3 years ago

It was discussed at the October 2020 Fortran call that another good use case for this feature would be for array initializers. In Python you can do (I corrected the syntax, thanks to @ivan-pi's comment below):

[x+1 if x >= 45 else x+5 for x in range(1, N+1)]

So in Fortran you could do:

[ (if (x >= 45) then x+1 else x+5 endif, x = 1, N) ]

Since conditional expressions are just expressions, this should just work.

ivan-pi commented 3 years ago

In Python you can do:

[x+1 for x in range(1,N+1) if x >= 45 else x+5]

This triggers a SyntaxError in Python. The correct way would be:

[x + 1 if x > 45 else x+5 for x in range(1,N+1)]

So the syntax is

expression_if_true if condition else expression_if_false

Edit: following Python if expression syntax, the example from above would be:

CALL SUB(A, B, C, D IF PRESENT(D) ELSE (EPSILON(X) IF X < 1 ELSE SPACING(X)))

It seems fairly nice, you just need to remember the condition is in the middle.

Edit2: Oops, I just noticed the posts from @14NGiestas above.

certik commented 3 years ago

New syntax paper being proposed at the June 2021 J3 Committee meeting:

klausler commented 3 years ago

New syntax paper being proposed at the June 2021 J3 Committee meeting:

What a mess! Tell me again how this is supposed to be better than a MERGE() intrinsic with stronger non-evaluation guarantees, please.

certik commented 3 years ago

@klausler thanks for the feedback. I personally think we should not do this feature at all (i.e. NO on the 21-157 proposal), as the two syntaxes (keyword and ?) seem worse than not doing this (based on the feedback both online above, as well as in private that I got). We can pursue the merge() idea and see if we can come up with a proposal.

klausler commented 3 years ago

@klausler thanks for the feedback. I personally think we should not do this feature at all (i.e. NO on the 21-157 proposal), as the two syntaxes (keyword and ?) seem worse than not doing this (based on the feedback both online above, as well as in private that I got). We can pursue the merge() idea and see if we can come up with a proposal.

I'm sure we've talked about this. In short, strengthen MERGE() so that when its mask (3rd) argument is scalar, and the TSOURCE&FSOURCE arguments have the same type and rank, it guarantees that exactly one of its TSOURCE/FSOURCE arguments are evaluated, and returns the appropriate value. Or define a new intrinsic function with these guarantees.

sblionel commented 3 years ago

The idea of modifying MERGE was discussed several meetings ago (I can't find which one). I liked the idea, but there were complaints that it would slow down MERGE for everyone and it failed.