j3-fortran / fortran_proposals

Proposals for the Fortran Standard Committee
174 stars 14 forks source link

Getter and setter procedures for derived types #30

Open cmacmackin opened 4 years ago

cmacmackin commented 4 years ago

Some languages such as Python and C# allow the definition of special getter and setter methods. Although these are methods, syntactically they allow the client code to behave as though the represent a public component of the class. This has the following advantages:

The latter, in particular, is important. It means that a type component can be left public (or protected, if #16 is accepted) without fear of having to break the API when refactoring. This would save the developer from having to manually implement getter and setter routines and reduce the amount of boiler-plate code.

I propose the following syntax:

type example_type
  ! ...
contains
  get :: foo => get_foo
  set :: foo => set_foo
end type

The getters and setters could have permissions just like a type-bound procedure, but given the use-case presumably we'd just want them to always be public. The interface for the getters and setters would be

pure function get_interface(this) result(component)
  class(...), intent(in) :: this
  type(...) [, (pointer | allocatable | dimension | ...)*] :: component
end function

elemental subroutine set_interface(this, value)
  class(...), intent(inout) :: this
  type(...), intent(in) [, (pointer | allocatable | dimension | ...)*] :: value
end subroutine

Client code could then call the getter and setter as follows:

type(example_type) :: bar
integer :: val
val = bar%foo ! invokes getter
bar%foo = val * 2 ! invokes setter
certik commented 4 years ago

I think you listed the arguments for this feature. A few arguments against such a feature is that adding any such new feature to the language makes it more complicated and more opaque, e.g., now it is not clear what exactly bar%foo = val * 2 does --- now it will be a method invocation, instead of a simple (member) variable assignment.

As such, the Pros should outweigh the Cons. I personally am not very convinced (yet) in this particular case.

One thing that I would like us to formulate are general rules of what features should not go into Fortran. For example, we probably do not want to make Fortran another C++ by simply adding every feature that is in C++ (or Python). One huge advantage of Fortran is that it is (still) a very simple language. What you see is (typically) exactly what happens. We would lose some of that with syntax like bar%foo = val * 2, which ultimately does make it just a little bit harder to learn (with every such feature). I guess it depends what direction we want Fortran to go. And having some kind of general guidelines that we all agree on would help I think.

Ultimately that's why there is a committee, so it does not really matter what I think personally, but I am just posting here to discuss the possible negatives of such a proposal.

tclune commented 4 years ago

I think a compelling case can be made for adding setters to the language, e.g., in the context of improved support for containers. (Getters don't really need any new functionality, IMO.) Yes, workarounds exist but at the cost of extra copies or requiring the use of pointers, and always with some loss of clarity.

I suspect the biggest obstacle would actually be gaining consensus on the syntax. Array-like syntax would be the most natural, but obviously conflicts with, er, array syntax. And square brackets have already been coopted for coarrays. So I'm thinking something like curly braces or "@":

my_obj{idx} = expr
my_obj@(idx) = expr

Note that something like the latter syntax is likely going to emerge in F202X for a different purpose: rank-agnostic array references.

cmacmackin commented 4 years ago

I disagree that this syntax makes Fortran more complicated or opaque to any significant degree; it does this no more so than does defined-assignment.

I also think the advantages go beyond what I described in my opening post. In order to ensure future changes won't break the API, currently all derived type components should be declared private and then accessors added for those meant to be publicly available. For large derived types this is extremely tedious, especially given the verbosity of Fortran. Because intrinsic getters and setters make a function call and direct access to the variable syntactically indistinguishable, that means the API can be preserved even if a previously public component is made private or outright removed from the derived type.

This allows for considerable time savings when writing code and reduction in the amount of boilerplate which is needed. In addition to making the programmer's life easier, this also makes them more productive and therefore improves Fortran's reliability as a language. The problem which Fortran currently runs into is that it takes significantly longer to write the same program in it than in Python (or other, higher-level languages). While it will certainly run faster, in most use-cases a programmer's time costs more than computational run-time. As such, we increasingly see people moving towards wrapping bits of legacy Fortran code with f2py while writing the rest of their software in Python. If we wish to preserve the viability of Fortran as a language we must take steps to improve productivity in it.

The fact that future-proofing the API requires the programmer to write trivial getters/setters even when they are not strictly necessary also impacts on performance. There will be the overhead of an extra function call, especially when the type-bound-procedure is not nonoverrideable and requires consulting a vtable. Functions which return polymorphic variables or (if I am not mistaken) a dynamic array do not tend to get optimised into subroutines by the compiler. This means that two assignments will tend to occur (first to the return variable in the function, then to whatever the function result is assigned to in the calling procedure) and possibly two memory allocations as well.

septcolor commented 4 years ago

In current Fortran, I feel it very tedious to define accessors (or setter/getter). This may be because of the verbosity of TBP syntax, which even seems to discourage coding for encapsulation. I think this situation is in contrast to other languages which often provide a more convenient syntax for the same purpose.

Personally, I use a combined approach of getter/setter + raw access of components (+ status flags), and the main purpose is to let each object self-manage its status and determine what to do in background. I think I am not an enthusiastic "fan" of OO, but still think that the above merit of self-management very useful (and indeed helped my coding a lot).

So, although it is not clear what is a best improved syntax (if any), I guess it would be nice if some new syntax/facility makes such coding easier (and hopefully less verbose).

vansnyder commented 4 years ago

Array-like syntax would be the most natural, but obviously conflicts with, er, array syntax.

Tom:

I proposed array/function syntax for structure components, getters, and setters at the X3J3 meeting in Albuquerque in 1986, when % was still controversial. John Reid, Rex Paige, and Brian Smith understood it. Others said "Fortran programmers want to see what their program is doing." Of course, you DO NOT want to see HOW your program is doing things. Parnas had explained that 16 years earlier.

There is no syntax conflict. Processors can already work out the difference between function and array references. Setters would not be a new problem. Indeed, an assignment to an array element is an invocation of a setter that the processor knows how to write and inline.

ivan-pi commented 3 years ago

If the protected attribute is allowed for members of derived types as already under discussion in #16 and #156, the "getter" problem would be solved for many usage cases.

I can still imagine instances where I would like to have something closer to the idea of Pythons @property decorator. An example of what I have in mind is something along the lines of


type :: matrix
  real, allocatable :: A(:,:)
  integer, property :: rows = size(matrix%A,1)  ! a read-only property
  integer, property :: cols = size(matrix%A,2)   ! a read-only property
contains
  procedure :: get_nrows  ! I would like to avoid having to write this function
end type

type(matrix) :: mat 

mat%A = reshape([1,2,3,4],[2,2])

print *, A%rows, A%cols    ! prints:       2       2

I am guessing this would conflict with the way some intrinsic functions work. The gfortran documentation says that size() will only return a meaningful result for an allocated array. On the other hand ifort appears to return 0 as the size of unallocated objects. In any case the idea is that the actual call to size() would be made when the member A%rows is referenced somewhere. It would probably make sense to limit the allowable property functions to pure intrinsic functions.

acikek commented 3 years ago

@ivan-pi to clarify, that property attribute would mean that the assigned value is an executable statement, yes? While I agree with you on that it would definitely only allow pure intrinsic functions, even within those constraints, it wouldn't allow for properties that call multiple statements.

This is dissimilar to Python's @property decorator. It seems that what you want is more alike to JavaScript's arrow syntax for defining functions (conveniently using the same => operator), which is a whole different ordeal. For getters and setters, personally, pointing to a contained function or subroutine such as in the original proposal makes more sense.