j3-fortran / fortran_proposals

Proposals for the Fortran Standard Committee
178 stars 15 forks source link

Scoped enums #262

Open ZedThree opened 2 years ago

ZedThree commented 2 years ago

This paper recommends that the new typed enumerators have values scoped to the type, and not to the enclosing scope.

The authors strongly believe that the current proposal for typed enumerators, as described in J3/21-120 and subsequent revisions, be amended so that enumerator names are class two names. That is, enumerator names must be local to each enumeration type and not in the surrounding scope. Some suggestions for syntax are made.

certik commented 2 years ago

@ZedThree thanks a lot for writing this up! I will read it carefully soon and provide feedback.

Let's iterate on this. Everybody, go ahead and provide feedback as well please.

klausler commented 2 years ago

I completely agree that the current 202X typed enumerators are ridiculous and should be deleted or replaced. Thank you for writing this up. There is zero chance that J3 will fix this, but maybe we implementers can agree on a better common solution.

Three suggestions:

(1) The syntax "T%E" is ambiguous at parsing time with a component reference. Consider another symbol, please. "T:E" would work. So would "T_E" if you want really easy implementation.

(2) The class-1 scoping of the current 202X misfeature could be retained with a keyword, like "ENUM, OPEN" or something.

(3) We have to also have optional value initializers for interoperability.

If an unambiguous syntax is used for literal constant enumerators, all of this would be trivial to implement in f18. A syntax that's ambiguous with component references could be made to work -- misparses one way or the other would have to be detected and repaired once the symbol table is in hand, as I have to do for statement function definitions vs array element assignments, function calls vs array references, array slices vs substrings, &c. &c. -- but would be a harder sell to me if anybody wanted a demonstrable prototype in short order.

ZedThree commented 2 years ago

@klausler Thanks for your comments, it's good to hear from an implementer!

(1) The appeal of % is that (to me at least) it feels quite intuitive what it means, but if it's impediment to implementation, that's certainly a strike against it. : feels quite overloaded already, maybe ::? Or even the ugly %%? Early revisions suggested enum_name(value_name) as a way to resolve conflicts. Could that be made to work?

Do you think if this had a straw poll for syntax, it would help?

(2) Indeed, or perhaps import which is already used for similar purposes in interfaces. I'm not proposing that feature here though, as I thought this might stand a better chance by keeping the scope (ha) limited.

(3) I'm not sure what you mean -- could you elaborate?

certik commented 2 years ago

(1) The syntax of accessing the enumerators T%E should be designed to be consistent with the potential namespaces for modules proposal (#1), which have a similar issue: some_module%some_function(). As well as if Fortran was to get namespaces (#87), they should use a consistent syntax. As an example, C++ uses . for "derived types", but :: for namespaces and enumerations (in C++ it seems the enumerators are both global as well as accessible locally using :: for enum, but only locally for enum class). So if we go with T:E, then also some_module:some_function() should work. What about T::E, some_module::some_function()?

plevold commented 2 years ago

I think T::E makes the most sense both for namespaces and for enums (and nopass procedures for that matter). This is common in multiple similar programming languages. Insisting on that Fortran syntax should be "unique" only makes it confusing for newcomers. As a comparison, Rust is considered to be "the C++ replacement" and has adopted much of its syntax like :: for namspaces and <> for generics.

@ZedThree It might be worth pointing out in the paper that with the current Fortran standard it's trivial to implement a strongly typed and properly namespaced enum. I've often done it in my code like the example below. It should be easy to reduce the amount of boilerplate code with e.g. fypp or maybe even compiler preprocessors as well. This begs the question if an arguably worse implementation is even needed in the standard.

module animal_mod
    implicit none

    private
    public animal_t, animals

    type :: animal_t
        private
        integer :: id
    contains
        generic :: operator(==) => equals
        generic :: operator(/=) => not_equals

        procedure, private :: equals, not_equals
    end type

    type :: animals_t
        type(animal_t) :: dog = animal_t(1)
        type(animal_t) :: cat = animal_t(2)
        type(animal_t) :: bird = animal_t(3)
        type(animal_t) :: fox = animal_t(4)
    end type

    type(animals_t), parameter :: animals = animals_t()

contains

    logical elemental function equals(a,b)
        class(animal_t), intent(in) :: a
        type(animal_t), intent(in) :: b
        equals = a%id == b%id
    end function equals

    logical elemental function not_equals(a,b)
        class(animal_t), intent(in) :: a
        type(animal_t), intent(in) :: b
        not_equals = .not. a == b
    end function not_equals

end module

program  main
    use animal_mod, only: animal_t, animals
    implicit none

    write(*,*) what_says(animals%dog)
    write(*,*) what_says(animals%cat)
    write(*,*) what_says(animals%bird)
    write(*,*) what_says(animals%fox)

contains

    pure function what_says(animal) result(sound)
        type(animal_t), intent(in) :: animal
        character(len=:), allocatable :: sound

        if (animal == animals%dog) then
            sound = 'woof'
        else if (animal == animals%cat) then
            sound = 'meow'
        else if (animal == animals%bird) then
            sound = 'tweet'
        else
            sound = '???'
        end if
    end function

end program

Program output:

 woof
 meow
 tweet
 ???
FortranFan commented 2 years ago

it's trivial to implement a strongly typed and properly namespaced enum. I've often done it in my code like the example below. It should be easy to reduce the amount of boilerplate code with e.g. fypp or maybe even compiler preprocessors as well. This begs the question if an arguably worse implementation is even needed in the standard.

An important need with proper enums is use with SELECT CASE constructs that no "roll your own" types permit.

FortranFan commented 2 years ago

Let's iterate on this. Everybody, go ahead and provide feedback as well please.

Honestly, @certik , at this stage the "ball" is in WG5's side of the "court". The current design was passed by J3 with no objections by any of the voting members. You can now be completely certain J3's position will be "There will be no further action on these suggestions at this time."

The only viable option is to submit a request with WG5 to withdraw the enumeration type feature from Fortran 202X. A good design can then be considered as a fresh slate toward Fortran 202Y.

FortranFan commented 2 years ago

This paper recommends that the new typed enumerators have values scoped to the type, and not to the enclosing scope. ..

@ZedThree , there are few use cases which are supported by the 'enumeration type` as currently included in draft Fortran 202X.

In my opinion, to pick one missing use case that is of interest to the authors and proposing something to get that addressed is just not right. I can't give a big enough down vote on this, it is thumbing its nose at all the other use cases.

certik commented 2 years ago

@FortranFan, can you formulate what you propose should be done? Do you like the enumerations as proposed in 2X more than this proposal?

FortranFan commented 2 years ago

@FortranFan, can you formulate what you propose should be done?

Submit a request with WG5 to withdraw the enumeration type feature from Fortran 202X.

klausler commented 2 years ago

Is there any essential difference between this specific proposal -- putting enumerators into a scope and requiring some kind of syntax that names the scope and the enumerator wherever an enumerator appears -- and using distinct prefixes on the enumerators? It seems to me that we're just saving some typing here.

certik commented 2 years ago

If you use distinct/unique prefixes (and are willing to accept this downside) then I think there is no difference.

ZedThree commented 2 years ago

@FortranFan :

@ZedThree , there are few use cases which are supported by the 'enumeration type` as currently included in draft Fortran 202X.

In my opinion, to pick one missing use case that is of interest to the authors and proposing something to get that addressed is just not right. I can't give a big enough down vote on this, it is thumbing its nose at all the other use cases.

I disagree, the existing proposal supports most use cases even if it is missing several features that would support more use cases, the main two being the ability to specify the underlying type, and values for individual enumerators. Both of these could be added in F202Y without breaking code that uses the current spec, and could be vendor extensions until then. I certainly disagree that this is "thumbing its nose at other use cases" -- this proposal is about enhancing all use cases, and not about any particular use case.

The existing spec certainly has its flaws and the final form seems to have ignored a couple of decades of language research and innovation, but I don't think it is so flawed that it should be completely thrown out.

@klausler :

Is there any essential difference between this specific proposal -- putting enumerators into a scope and requiring some kind of syntax that names the scope and the enumerator wherever an enumerator appears -- and using distinct prefixes on the enumerators? It seems to me that we're just saving some typing here.

It's not just about saving typing, although that is part of it, it's about building in best coding practice. Rather than having to teach new developers "you should always add a unique prefix to enum names, because sometimes it's important, and if you don't you might have to rename each individual enumerator when you use them", we can just teach them how to use them and that's enough.

More concretely, here's an example:

! Existing spec using renaming:
use colour_mod, only: colour_t, colour_red => red, colour_green => green, colour_blue => blue
use alert_mod, only: alert_t, alert_red => red, alert_amber => amber, alert_green => green
! or with prefixes:
use colour_mod, only: colour_t, colour_red, colour_green, colour_blue
use alert_mod, only: alert_t, alert_red, alert_amber, alert_green

! This proposal:
use colour_mod, only: colour_t
use alert_mod, only: alert_t
plevold commented 2 years ago

An important need with proper enums is use with SELECT CASE constructs that no "roll your own" types permit.

@FortranFan that's true, but with the proposed select case statement my understanding is that you will only save a few keystrokes. Sure, the code

if (animal == dog) then
    sound = 'woof'
else if (animal == cat) then
    sound = 'meow'
else if (animal == bird) then
    sound = 'tweet'
else
    sound = '???'
end if

could be rewritten to

select case(animal)
case (dog)
    sound = 'woof'
case (cat)
    sound = 'meow'
case (bird)
    sound = 'tweet'
case default
    sound = '???'
end select

That should give you 114 non-whitespace characters instead of 123 and maybe somewhat better readability. The behavior is exactly the same though.

On the other hand, if the Fortran compiler would require that the select case statement was exhaustive, either by including all enum variants or by having a case default statement then there would be an actual advantage of using select case over if-s.

Is there any essential difference between this specific proposal -- putting enumerators into a scope and requiring some kind of syntax that names the scope and the enumerator wherever an enumerator appears -- and using distinct prefixes on the enumerators? It seems to me that we're just saving some typing here.

@klausler in addition to the example by @ZedThree another advantage of using scope instead of a prefix is that (a future) Fortran standard could allow the scope to be omitted in case (...)-statements since the enum type is already known from the select case (var) line. For example,

select case(animal)
case (animal_t::dog)
    sound = 'woof'
case (animal_t::cat)
    sound = 'meow'
case (animal_t::bird)
    sound = 'tweet'
case default
    sound = '???'
end select

could be simplified to

select case(animal)
case (dog)
    sound = 'woof'
case (cat)
    sound = 'meow'
case (bird)
    sound = 'tweet'
case default
    sound = '???'
end select

still without the risk of naming collisions. This is for example allowed in Java switch-statements.

klausler commented 2 years ago

If you're just saving typing, it's not worth the effort and drama. Add int values and I/O and you'd have something.

SELECT CASE is O(1) or O(log2(N)). An IF cascade is O(N). More important, a SELECT CASE with no default can be checked at compilation time for missing values.

plevold commented 2 years ago

@klausler good point about the performance differences. So with the current proposal one would have to choose between performance (with enumeration type) or maintainability (with a custom implementation). In my experience, maintainability trumps performance in large software projects unless the penalty is absolutely prohibiting.

More important, a SELECT CASE with no default can be checked at compilation time for missing values.

Absolutely agree. My understanding is that with the current 202X proposal compilers are not required to give an error on missing values, but maybe I've misread something?

ZedThree commented 2 years ago

As far as I can see it, these are the missing features that one would want from an enum in a modern language:

Of these, I think the first two cannot be implemented in a future standard without breaking backwards-compatibility, while the other three can. Are there others? I'm ignoring proper sum/algebraic types here, because I think they would only be usable with proper generic typing support.

Several people that have knowledge of how the committee works have said this proposal doesn't stand any chance of getting accepted. Is "kill/delay typed enumerations" more likely to get accepted instead?

21-110 is the paper that stripped down the much nicer 19-216. Does anybody have any insight into what the committee thought was too complicated about 19-216?


* Weirdly, the current spec has some degree of reflection with next()/previous() and enum_type(1)/huge(enum_type), not seen in many other languages

klausler commented 2 years ago

Of these, I think the first two cannot be implemented in a future standard without breaking backwards-compatibility, while the other three can. Are there others? I'm ignoring proper sum/algebraic types here, because I think they would only be usable with proper generic typing support.

Proper sum types as in ML/O'Caml/Haskell would want their constructors (including nullary ones) to be available in the declaration scope anyway.

Several people that have knowledge of how the committee works have said this proposal doesn't stand any chance of getting accepted. Is "kill/delay typed enumerations" more likely to get accepted instead?

No. There's huge overlap between WG5 and J3. J3 asking WG5 for permission to defer the feature involves the same group of people taking off their J3 hats, donning their WG5 hats, and saying "no" to themselves.

ZedThree commented 2 years ago

No. There's huge overlap between WG5 and J3. J3 asking WG5 for permission to defer the feature involves the same group of people taking off their J3 hats, donning their WG5 hats, and saying "no" to themselves

I'm sorry, I am entirely ignorant of the structure here. There's no chance of changing J3's mind on this then either?

certik commented 2 years ago

@ZedThree there is always a chance. That's why I asked you to write this up. Even if they say no, we can require the committee to provide a response why the paper was not accepted. Furthermore, it provides guidance to compiler developers to implement as extensions.

There is committee politics and how it operates, but I urge all participants here to stay away from that, as it is not productive (or move it to a separate "meta" issue). In this issue, let's focus on having free discussion about what makes sense to do for enumerations.

Peter K. has good points about it being "not worth doing", as it is "just syntax". But, isn't most of Fortran "just nice syntax'? It absolutely is, and that is why I like it. And I think having modern enumerations in the language should be the goal, and we should not be discouraged. Peter H.'s arguments thus resonate with me. Finally, I think @FortranFan's main argument is that this proposal doesn't go far enough, and thus is not worth it, but I think it's a step in the right direction and most importantly: it allows to add the other things in backwards compatible manner. If this proposal is not accepted, and the current F2023 enumerations go in, then I think we can't change the scope without breaking backwards compatibility.

klausler commented 2 years ago

No. There's huge overlap between WG5 and J3. J3 asking WG5 for permission to defer the feature involves the same group of people taking off their J3 hats, donning their WG5 hats, and saying "no" to themselves

I'm sorry, I am entirely ignorant of the structure here. There's no chance of changing J3's mind on this then either?

WG5 is the ISO (international) standard committee for Fortran. J3 is the old name for the American standard committee for Fortran. WG5 comes up with lists of work items for each new standard, and asks J3 to define them; J3 writes draft international standards, gives them to WG5, and they get approved as ISO standards. J3 also takes care of corrections and interpretation requests.

One problem with this arrangement is that it has no feedback loop that includes implementation experience or even broad review before WG5 hands ideas off to J3, and in the 3+ decades since the structure was established, there's a sad history of not-completely-thought-through changes being set permanently into the language. That's why we have PDTs still not being implemented widely enough to be portable after 20 years; a DO CONCURRENT construct that is serial; and soon, rank-agnostic array indexing. And we don't have simple things like pointers to immutable data. It is a mess.

plevold commented 2 years ago

I'm ignoring proper sum/algebraic types here, because I think they would only be usable with proper generic typing support.

I think proper sum types has a lot use cases without proper generic types (and even more with it). How JSON values are parsed in Rust is a good practical example: When reading data at a given path, e.g. root["foo"]["bar"] you get back a Value enum:

pub enum Value {
    Null,
    Bool(bool),
    Number(Number),
    String(String),
    Array(Vec<Value>),
    Object(Map<String, Value>),
}

While the data of the Array and Object variants use generics this enum doesn't. If Fortran enumration types had support for different types it would be possible to implement this kind of interface in Fortran as well without any additional language features.

For reference, the alternative "object oriented" approach (one that is possible to implement in Fortran today) would be to use an abstract base class:

type, abstract :: json_value_t
end type

type, extends(json_value_t) :: json_bool_t
    logical :: value
end type

type, extends(json_value_t) :: json_number_t
    real(dp) :: value
end type

And so on. This approach is (of course) very verbose and does not provide much type safety: While you could select type on a class(json_value_t) variable there's no way of knowing that you've covered all possible variations since inheritance is not an finite set . Even a type outside this hypothetical JSON parsing library could extend json_value_t.