j3-fortran / fortran_proposals

Proposals for the Fortran Standard Committee
178 stars 15 forks source link

Augmented assignment (+=) #113

Open Leonard-Reuter opened 4 years ago

Leonard-Reuter commented 4 years ago

allow writing

a += b

instead of

a = a + b

and

a *= b

instead of

a = a * b

I think this feature is quite important, since the lack of it misleads programmers towards the use of (too) short variable names: I would write like to write this:

electrons_number += charge

But maybe prefer this

en = en + charge

over this abomination of a line =)

electrons_number = electrons_number + charge

This should not end in a discussion whether to use long variable names or not, let's just agree, that they are better sometimes.

This is a second use case

This is a third (and fourth) use case

certik commented 4 years ago

This is a very common request, so we should have an issue for it and document why such ideas were not implemented yet.

The last time I have seen this discussed is with this proposal: https://j3-fortran.org/doc/year/19/19-111r1.txt. It actually has reason why it was rejected.

(The += and *= would also imply to have /=, but that is already taken for .ne.. The 19-111r1 paper thus introduces a new operator that could do the same.)

Leonard-Reuter commented 4 years ago

This is a very common request, so we should have an issue for it and document why such ideas were not implemented yet.

The last time I have seen this discussed is with this proposal: https://j3-fortran.org/doc/year/19/19-111r1.txt. It actually has reason why it was rejected.

(The += and *= would also imply to have /=, but that is already taken for .ne.. The 19-111r1 paper thus introduces a new operator that could do the same.)

I think, just doing the two 'positive' operators would already help a lot. '-=' and '/=' are much less needed and can be circumvented like this: The hypothetical code:

a = 1.0_real64
do i=1, 10
    a -= vector(i)
end do

Can be written as:

a = 1.0_real64
a *= -1
do i=1, 10
    a += vector(i)
end do
a *= -1

And analogous for division (maybe in line with the loop statements, to make the purpose clearer?):

a = 1.0_real64
a = 1/a; do i=1, 10
    a *= vector(i)
end do; a = 1/a

While this seems messy, it prevents the 'space ship operator' from being valid:

a -=- 1  ! this gives the same as 'a += 1'

The main point however should be, that '-=' and '/=' are much less needed. I would really advise against another operator for '/='. That would be quite counter-intuitive.

PS: since the referred paper was in parts rejected due to a missing use case, I will add one to the original post =)

certik commented 4 years ago

Good point: the proposal would be to only add += and *=. I would say the += is probably the most common in practice. I grepped one of my codes for i = i + 1 and I use just this particular string 7 times. I use j = j + 1 8 times, k = k + 1 3 times, etc., and that's just a small subset of all usages for +=. So having += would simplify dozens of places in my own code for sure.

klausler commented 4 years ago

The important aspect of BCPL's assignment operator(s) is that the left-hand side of the assignment is evaluated only once; e.g., if Fortran were to acquire such a feature, in A(F()) += X, there would only be one call made to F.

Leonard-Reuter commented 4 years ago

The important aspect of BCPL's assignment operator(s) is that the left-hand side of the assignment is evaluated only once; e.g., if Fortran were to acquire such a feature, in A(F()) += X, there would only be one call made to F.

I referenced your point in the original post.

klausler commented 4 years ago

Additionally, this feature should specifically state whether the augmented assignment is atomic or reducing or disallowed in some circumstances (shared variable in DO CONCURRENT, &/or coindexed references).

Leonard-Reuter commented 4 years ago

I'd suggest also adding **= to have the 'positive' augmented assignment operators complete.

jacobwilliams commented 4 years ago

For once I agree with the committee for not adding something. 😃 I think these variants are best left to C-based languages. I don't think they are very "Fortranic". Aesthetically I've always thought they were weird, and I don't think we really gain much by having them (except new confusion about /=).

FortranFan commented 4 years ago

@Libavius wrote:

.. The hypothetical code:

a = 1.0_real64
do i=1, 10
    a -= vector(i)
end do

Can be written as:

a = 1.0_real64
a *= -1
do i=1, 10
    a += vector(i)
end do
a *= -1

The above "hypothetical code" can be written using current Fortran standard as:

a = 1.0_real64 - sum(vector)
certik commented 4 years ago

In fact you can write it as

a = 1 - sum(vector)

But I think @Libavius's point was to show how it can be used. A typical loop where I use i = i + 1 is:

i = 1
j = 1
do j = 1, size(nl)
    n = nl(j)
    zeta = zl(j)
    l = ll(j)
    do m = -l, l
        nlist(i) = n
        zetalist(i) = zeta
        llist(i) = l
        mlist(i) = m
        i = i + 1
    end do
end do

Where it is kept as a running index.

I am leaning towards @jacobwilliams's comment above (https://github.com/j3-fortran/fortran_proposals/issues/113#issuecomment-566687141) that i = i + 1 looks great and even though i += 1 is shorter, it is a new syntax, and so it might not be worth it in this case.

Leonard-Reuter commented 4 years ago

@FortranFan I know, thanks anyway for the hint. @certik that was my point indeed. I also agree, that i = i + 1 looks good, however this is not always the case.

A more elaborate use case:

type Molecule_t
    type(Atom_t), allocatable :: atoms(:)
contains
    procedure, pass :: Get_electrons_number => Molecule_t_get_electrons_number
end type Molecule_t

contains

function Molecule_t_get_electrons_number(this) result(electrons_number)
    class(Molecule_t), intent(in) :: this
    integer :: electrons_number

    integer :: i

    electrons_number = 0

    if (ALLOCATED(this%atoms) then
        do i=1, SIZE(this%atoms)
            electrons_number = electrons_number + this%atoms(i)%get_electrons_number()
        end do
    end if
end subroutine Molecule_t_get_electrons_number

The loop would–in my opinion–be easier readable like this:

do i=1, SIZE(this%atoms)
    electrons_number += this%atoms(i)%get_electrons_number()
end do

@FortranFan I could write this with the SUM intrinsic (but this would be slower, since an array would need to be allocated):

electrons_number = SUM([( this%atoms(i)%get_electrons_number(), i=1, SIZE(this%atoms) )])

EDIT: (or easier, but still with allocation, if Get_electrons_number is elemental

electrons_number = SUM(this%atoms%get_electrons_number() )

This is in addition to the possible reduction of function calls that @klausler mentioned.

@jacobwilliams jup, the confusion with /= ist quite painful and probably the strongest counter-argument...

certik commented 4 years ago

@Libavius for your latest use case, I would recommend to change get_electrons_number() to not be a function, but just a member variable. In fact I would recommend to change Molecule_t to:

type Molecule_t
    real(dp), allocatable :: electrons_number(:)
    ...
contains
    procedure, pass :: Get_electrons_number => Molecule_t_get_electrons_number
end type Molecule_t

And then change the loop to just:

electrons_number = sum(this%electrons_number)
Leonard-Reuter commented 4 years ago

@certik But then, if I change my atoms array I always have to be careful and update electrons_number as well.

The idea in this case would be, that a molecule has atoms but an atom can also exist without a molecule. To avoid code duplication I don't want the molecule to have a member 'electrons_number' or implement atom-related functions again but rather iterate over its atoms.

certik commented 4 years ago

The way I do this in my electronic structure codes is that I do not have an Atom type at all. So if your molecule has only one atom, then the length of all the arrays will be just 1. There is no code duplication and in fact things run much faster.

cmacmackin commented 4 years ago

@septcolor I've also suggested something like these be included in Fortran (#30 ). I feel they are useful because they allow you to expose derived type components, safe in the knowledge that if you need to refactor the data storage layout you can always add a getter in future. They are thus extermely useful for encapsulation, which @certik's approach breaks.

certik commented 4 years ago

@cmacmackin indeed, the approach I suggested is flat and exposed. It's not encapsulated. It has the huge advantage that it works well with current Fortran, and typically optimizes really well and runs fast, and it's actually typically simpler in terms of lines of code (as demonstrated above). The disadvantage is that if you change the underlying representation, you need to rework your whole code. So with my approach, one has to design things well, but one can use the current Fortran language. For the encapsulated approach, Fortran does not have the best tools currently.

aradi commented 4 years ago

As for the syntax, what about Espen Myklebusts proposal in #80, using the @ operator representing the LHS of an assignment? So @septcolor s example could be writen as

my_data(idat) % its_sub_obj(iobj) % another_sub_obj(isubobj) = @ + 1

It would not cause any conflicts with current operators, so it could be also used for division (I would find it really strange, if *= were introduced but in place division won't work...). And it had the benefit, that the @ operator (representing the LHS) could appear even more than once on the RHS, as in

array(i,j)%comp1(k,l)%value1 = abs(@ - sqrt(@))

(example by Espen Myklebust).

certik commented 4 years ago

@aradi is your idea different to the proposal I linked at https://github.com/j3-fortran/fortran_proposals/issues/113#issuecomment-566223095, which was rejected?

aradi commented 4 years ago

@certik Yes, Espen Myklebusts proposal has in my opinion a considerably lower complexity, as the one you have linked in. The @ operator would always represent the entire LHS, while in your proposal it can also represent a part of it (as in a ( s @ (3*i+1) ) = b(s)). It does not involve naming, so no conflicts with eventual implicit statements must be feared and there is no need to introduce a new (statement level) scope.

certik commented 4 years ago

@aradi thanks for the clarification. (For the record the 19-111r1 proposal is not my proposal, I just linked to it.)

Leonard-Reuter commented 4 years ago

As for the syntax, what about Espen Myklebusts proposal in #80, using the @ operator representing the LHS of an assignment? So @septcolor s example could be writen as

my_data(idat) % its_sub_obj(iobj) % another_sub_obj(isubobj) = @ + 1

It would not cause any conflicts with current operators, so it could be also used for division (I would find it really strange, if *= were introduced but in place division won't work...). And it had the benefit, that the @ operator (representing the LHS) could appear even more than once on the RHS, as in

array(i,j)%comp1(k,l)%value1 = abs(@ - sqrt(@))

(example by Espen Myklebust).

I think this'd be superb elegant =)

EDIT: After some time, I'm not so sure anymore. It is also a bit of syntactic overload...

klausler commented 4 years ago

I like the idea of using a symbol in the right-hand side to denote the current value of the left-hand side, but @ is accepted as a legal character in identifiers as an extension in many compilers (as is $ as well). # would conflict with preprocessing directives if it were used and it appeared at the beginning of a continuation line. '*' would be very Fortran-like but would lead to ambiguity (X=Y*****Z), as would /. So that leaves us % and ?, I think.

everythingfunctional commented 4 years ago

I think the reasons outlined why augmented assignment probably won't work out are true. But I would be in favor of the "pronoun" proposal that @certik mentioned. There are a variety of languages starting to support this kind of feature (most commonly in languages that have a pattern matching feature). One might think of this as basically syntactic sugar for an associate block with only a single line in it.

For example:

v @ x(3*i+1) = v * 42 + b(i)

would be directly equivalent to:

associate(v => x(3*i+1))
    v = v * 42 + b(i)
end associate

So this could basically be treated as syntactic sugar, not really a new functionality.

klausler commented 4 years ago

I think the reasons outlined why augmented assignment probably won't work out are true. But I would be in favor of the "pronoun" proposal that @certik mentioned. There are a variety of languages starting to support this kind of feature (most commonly in languages that have a pattern matching feature). One might think of this as basically syntactic sugar for an associate block with only a single line in it.

ASSOCIATE() could be extended slightly to accommodate a single statement as well as a block.

opeil commented 4 years ago

Perhaps, it is too late to discuss this but somehow I have not seen anyone mentioning another important aspect of the compound/augmented assignment: It allows to perform more efficient in-place operations, such as big_array(:, :, :) += big_array2(:, :, :), without the risk of creating temporaries. This makes it more than just a syntactic sugar.

Another similar example is that if we have a type like this

type :: BigObject
   real(dp) :: values(10000, 10000)
end type

type(BigObject) :: a, b

and there is a frequent need to do operations like this -- a % values = a % values + b % values, then a naive overloading of operator (+) would result in an extremely inefficient code. It would be great if one has a possibility to overload something like (+=) [with obvious call semantics] to implement heavy in-place operations.

certik commented 3 years ago

There is a renewed discussion at the J3 mailinglist about this feature.

I created a prototype implementation in a compiler: https://gitlab.com/lfortran/lfortran/-/merge_requests/1286

klausler commented 3 years ago

ASSOCIATE can't be used to form a mutable association with every variable. Variables with vector-valued subscripts, and coindexed references, become immutable copies of expressions if they appear as "right-hand" sides of selectors in ASSOCIATE & al. If you want to use ASSOCIATE to get a mutable abbreviation of the variable of an assignment-stmt, it won't always work.

I prefer the idea of using a symbol in the expr of assignment-stmt to denote (a copy of) the variable, e.g. "var = @ + expr". It would work with any operation (including those spelled like .AND.) and it allows repetition, use of the functions (e.g., var=transpose(@)), and the syntax doesn't look like it implies atomicity. (If @ can't be used, there's other symbols that would work syntactically.)

mjklemm commented 3 years ago

Ada extends the notion of '@' so that you can do things like this:

c(i,j) = cmplx(real(@) + 1.0d0, aimag(@) * 3.0d0, real64)

The '@' can stand-in for the entity on LHS as many times as needed and as a sub-expression on the RHS.

So, this generalizes what you could achieve with operator assignments like +=.

mjklemm commented 3 years ago

PS: It's required that the expression that is bound via '@' is evaluated only once, so that it's clear how many times functions and the likes are evaluated then they occur multiple times as part of a '@'.

certik commented 3 years ago

Thanks @klausler and @mjklemm for the comments!

everythingfunctional commented 3 years ago

I do quite like the idea of a symbol for a shorthand for the lhs. I feel almost certain that using @ for that would conflict somehow with its use as the rank-agnostic subscript feature, but I can't quite put my finger on exactly how. Hopefully we can come up with some other agreeable symbol. $ perhaps? I don't believe it's been used other than in format specifications, but I could be mistaken.

klausler commented 3 years ago

Some further thought about the use of a special symbol (say .. here) to represent the entire previous value of the left-hand side of an assignment-stmt is leading me to think that it would be a worthwhile feature to consider.

certik commented 3 years ago

Another discussion of this proposal: https://fortran-lang.discourse.group/t/updating-assignment-operators/2129

opeil commented 3 years ago

First of all, I think that the idea (originally by Espen Myklebusts, I suppose) with a LHS designator instead of dozens new operators is great and offers a syntactically simple and flexible way of enhancing assignments. I could only suggest to use an existing Fortran association syntax to avoid introducing additional symbols. This way, the combined assignment-operation would look like:

(lhs => container % array(:, :, i)) = lhs + fun(lhs)
(c => complex_array(i, j)) = cmplx(real(c) + 1.0_real64, aimag(c) * 3, kind=real64)

etc. This is essentially a more "fortranic" development of the idea by @everythingfunctional .

Such a construction can be considered as a one-line ASSOCIATE block, but, as @klausler has remarked, the restrictions inherent to the ASSOCIATE statement can be considerably relaxed in the new construction. Typical restrictions of the assignment statement should rather be applied.

The advantage of such a syntax is that it does not introduce any new lexer elements and does not seem to create a clash with existing constructions.

aradi commented 3 years ago

To me, the precise goal of the augmented assignment, as discussed here, is still unclear. Should it be simple syntactic sugar to abbreviate code (which would be for sure useful), or should it represent new operators, which one can also extend for user defined types (useful when designing efficient OOP-based abstract calculus concepts)?

certik commented 3 years ago

To me just += and *= would be the most useful. The extra operators seem quite complicated to see what is going on.

opeil commented 3 years ago

A couple of points (from someone interested in having either += style operators or any kind of shorthand notation for assignments of the form lhs = lhs .op. foo):

  1. All constructs considered here can be considered as syntactic sugar.
  2. That said, they also change the semantics of the operation, sending a clear message to the compiler that the operation should be done in-place if possible. The usual semantics of Fortran implies that the RHS is first evaluated and then assigned, which by default requires a temporary storage.
  3. Inasmuch as I myself like to use augmented operators +=, *=, etc. in other languages, there is an obvious issue with /= in Fortran that does not seem to be resolvable in an elegant way (introducing another symbol for division or skipping it entirely cannot be considered as elegant). This leads to a well-motivated opposition from the Committee to such operators, qualifying them as "non-fortranic" (Steve Lionel expressed this explicitly on several occasions). Furthermore, introducing new operators is not encouraged because technically Fortran already allows a user to define infinitely many custom operators by means of the dot notation.
  4. If one at all wants to have a more concise notation and more computationally efficient assignment semantics in the standard, one would need to come up with a solution that is aligned with the Fortran style (i.e., sufficiently "fortranic") and that does not introduce new entities, if possible.
Leonard-Reuter commented 3 years ago

@opeil @certik In order to resolve the issue with \= how about adding a dot in between the operator and "=": This would give: +.=, -.=, *.=, /.=, **.= Furthermore it can be applied to any custom operator .op. as .op.= (dropping one dot) I don't know, if this is 'fortranic' enough, but it would resolve the issue with /= and the dot would be borrowing the operator dot notation. On the other hand, it would still be very much intelligible for any reader.

klausler commented 3 years ago
  1. All constructs considered here can be considered as syntactic sugar.

That is not the case with C's augmented assignment operators, not is it true with the proposals above.

2.. That said, they also change the semantics of the operation, sending a clear message to the compiler that the operation should be done in-place if possible. The usual semantics of Fortran implies that the RHS is first evaluated and then assigned, which by default requires a temporary storage.

This is also not the case. Instances where an augmented assignment statement might require use of a temporary would also be in need of a temporary if written without the augmentation.

  1. Inasmuch as I myself like to use augmented operators +=, *=, etc. in other languages, there is an obvious issue with /= in Fortran that does not seem to be resolvable in an elegant way (introducing another symbol for division or skipping it entirely cannot be considered as elegant). This leads to a well-motivated opposition from the Committee to such operators, qualifying them as "non-fortranic" (Steve Lionel expressed this explicitly on several occasions). Furthermore, introducing new operators is not encouraged because technically Fortran already allows a user to define infinitely many custom operators by means of the dot notation.

Fortran allows new user-defined operators but not new assignment statement symbols.

The most general proposal for augmented assignments -- namely, the introduction of a symbol to represent the previous value of the left-hand side of the assignment -- avoids the problems with /=.

aradi commented 3 years ago

The most general proposal for augmented assignments -- namely, the introduction of a symbol to represent the previous value of the left-hand side of the assignment -- avoids the problems with /=.

I like a lot the idea of having a placeholder for the lhs, as this would abbreviate long expressions in many cases. But then, it would be still syntactic sugar only, as programmers won't be able to extend it for their own types in order to provide efficient in-place implementations for them. (As one would be able to do by overriding the += operator in C++ or the __iadd__ method in Python...)

klausler commented 3 years ago

The most general proposal for augmented assignments -- namely, the introduction of a symbol to represent the previous value of the left-hand side of the assignment -- avoids the problems with /=.

I like a lot the idea of having a placeholder for the lhs, as this would abbreviate long expressions in many cases. But then, it would be still syntactic sugar only, as programmers won't be able to extend it for their own types in order to provide efficient in-place implementations for them. (As one would be able to do by overriding the += operator in C++ or the __iadd__ method in Python...)

Neither an augmented variant of assignment (+=) nor a symbol that refers to the previous value of the left-hand side of an assignment are just "syntactic sugar". They must have semantic guarantees that the expressions on the left-hand side of the assignment are not evaluated more than once, and this property is not one that can be preserved with simple rewriting transformations. For example, rewriting A(IFUNC()) += 1 to A(IFUNC()) = A(IFUNC()) + 1 would probably be a bad idea.

One of the reasons the alterative approach of using a symbol that designates the previous value of the left-hand side is more natural than augmenting variants of assignment(s) is that you can use your user-defined OPERATOR(+) in place, as it were, without having to also define an ASSIGNMENT(+=), if such a thing were possible. This might not be what you'd want if your += had semantics that were not the same as your + in the context of assignment (e.g., += is somehow atomic), but for many use cases the "left-hand side symbol" approach will save boilerplate.

Leonard-Reuter commented 3 years ago

Maybe there should be an independent issue proposing the LHS symbol? While its usage has some overlaps with augmented assignment, it is really a different thing as it extends way beyond what augmented assignment can do, but is less intuitive to use and read (especially for people who switch from other languages).

aradi commented 3 years ago

@klausler OK, I see your point. But, then, similar to what have been proposed above, an extension of the current associate construct may already be enough to enable it? Actually, I think, by introducing the statement form of associate one could already have it (or do I miss something obvious?)

associate(lhs => A(IFUNC())) lhs = lhs + 1

associate(c => complex_array(i, j)) &
    & cmplx(real(c) + 1.0_real64, aimag(c) * 3, kind=real64)

It won't be as compact as other forms proposed, but it would be for sure a very natural extension of the current language.

Actually, augmented assignment operators (+=, *=, etc.) only make sense to me, if

If any of the two is not given, I'd rather go with the LHS-association. That seems to fit more to the current language and would allow for a much more flexible usage.

klausler commented 3 years ago

ASSOCIATE doesn't create a modifiable association for designators containing vector-valued subscripts or coindexes today. If your left-hand side "associate" syntax doesn't support those, it's too limited, and if it does, it shouldn't use the same keyword. We also don't need an name for the left-hand side; there's can't be another assignment statement in its scope, so a single left-hand side symbol suffices.

kbauer commented 6 months ago

In the renewed discussion mentioned in https://github.com/j3-fortran/fortran_proposals/issues/113#issuecomment-905758722, there appears the idea that

A /= B

might actually work, since assignment isn't allowed in expressions, so there should be little risk of mixing up

IF( A /= B ) THEN ! or
L = A /= B

with

A /= B

as assignment for both compilers and human code readers. While not ideal, I'd take such a solution gladly over lines such as

forceFromExternalSource = scaleFactor * forceFromExternalSource