j3-fortran / fortran_proposals

Proposals for the Fortran Standard Committee
175 stars 14 forks source link

Simple Functions in Constant Expressions #253

Open everythingfunctional opened 2 years ago

everythingfunctional commented 2 years ago

The 202X standard will include a new attribute for procedures, simple. If a procedure is simple it depends on, and uses only its arguments, and calls only simple procedures. This means that, in theory, if a simple function's arguments are themselves constant expressions, it could be evaluated before program startup. I contend that this would be quite valuable for defining named constants of derived types whose components are private. For example given a string type defined like

module string_m
  implicit none
  private
  public :: string_t
  type :: string_t
    private
    character(len=:), allocatable :: str_
  end type

  interface string_t
    module procedure constructor
  end interface
contains
  simple function constructor(str) result(string)
    character(len=*), intent(in) :: str
    type(string_t) :: string

    string%str_ = str
  end function
end module

It should be possible to define something like the following

type(string_t), parameter :: lines(*) =  &
    [ string_t("Hello, World!") &
    , string_t("It's a great day!") &
    , string_t("Hooray Fortran!") &
    ]
FortranFan commented 2 years ago

@everythingfunctional ,

SIMPLE procedures, as developed in Fortran 202X, are not limited by constant expression considerations.

Please see this #214 which proposes CONSTEXPR procedures instead for user-defined functions in constant expressions. There is a reason for this: there are certain difficult complications with Fortran standard semantics when it comes to allowing executable statements and constructs in constant expressions.

Consider the string%str_ = str statement in your example code: I will spare commentary but you know there is a lot "underneath" this statement with allocation-upon-assignment that many a implementation handles only during run-time. For these, what you propose is effectively a no-no.

I have surmised improved compile-time computing is only amenable in Fortran with a rather limited set of instructions. Having the ability to define them as CONSTEXPR procedures that essentially only include constant expression statements themselves which then enable certain code reuse and yield all the attendant benefits will be a good one to achieve for enhanced compile-time computing in Fortran 202Y.

Hence it will be helpful if you and others who see value in this proposal would also give a thumbs up for #214.

Thank you,

everythingfunctional commented 2 years ago

@FortranFan , you honed right in on the complications and saw through my attempt not to specifically mention them. However, your assumption that constant expressions must be known at compile time is not quite true. In fact, the standard says nothing about compile time. It would be perfectly reasonable for constant expressions to be computed immediately prior to program execution (i.e. at program startup). In fact there are languages which do this. I believe initializing a constant value of a type with pointer members in C++ (i.e. it's components must be allocated on the heap), or even just a user defined constructor, is done exactly this way.

Part of my reason for asking for this feature is specifically because I want to be able to define constants of types with allocatable components. Let alone be able to have those types keep their components private. I'll admit there might be some complications involved with this that I'm not quite seeing, but as far as being able to implement such a feature, I think it's absolutely feasible within the current standard.

FortranFan commented 2 years ago

@everythingfunctional wrote Mar. 16, 2022 1:15 PM EDT:

your assumption that constant expressions must be known at compile time is not quite true. In fact, the standard says nothing about compile time. It would be perfectly reasonable for constant expressions to be computed immediately prior to program execution (i.e. at program startup). In fact there are languages which do this.

Note

image

I really think an attempt to get the standard to allow SIMPLE functions in constant expressions will not gain traction with the committee.

klausler commented 2 years ago

Poor persevering programmers preparing processors for practitioners can, perhaps, put some run-time "constant" expressions into play (initializer expressions), but perusing your published PDF produces a plethora of prohibited possibilities. A plurality of points where "scalar-int-constant-expr" appears must positively produce proper constants, per se, post parsing. Ponder real(kind=simplefunc()) :: x patiently.

everythingfunctional commented 2 years ago

real(kind=simplefunc()) :: x

I see. This actually does pose a potential problem if the expression cannot be fully evaluated at compile time. But I still contend that CONSTEXPR would restrict one to what is effectively already doable via named constants (i.e. the parameter attribute).

Perhaps I'm trying to apply this idea too broadly by utilizing an already existing aspect of the language. Perhaps what I'm really after is a new designation, something like "initialization expression". These could include the use of simple functions, and many places that require constant expressions now, could be relaxed to allow for initialization expressions. The kind value for an intrinsic type would then be a place that still requires a constant expression.

certik commented 2 years ago

real(kind=simplefunc()) :: x

In LFortran we do two passes over the AST (Abstract Syntax Tree, the result of parsing), in the first pass we figure out types of all functions and variables (symbols), and we evaluate an expression like kind = simplefunc(), and currently such expressions can only use constants, arithmetic operations and intrinsic function calls, so it can be straightforwardly evaluated in the compiler itself, without executing any Fortran code (for example if you call sin(x), then we call our own implementation in C++). Then in the second pass we finish compiling the bodies of all functions and resolve all variables to (now existing) symbols in the symbol table.

This approach would obviously break if simplefunc() is a user defined function, as it would require to compile this function first and execute it, while figuring out the types of symbols.

However, I think that can be done, I think one can allow executing of any function at compile time: as long as the function(s) can be actually compiled (all the types are known), then we can execute them while still compiling other code. Perhaps we can restrict it to only allow calling user defined functions from other modules, so we can still use the above two pass approach for a given module, but if another module is already compiled, we can execute functions from it. For example the Jai language allows executing of any code (including the whole program!) at compile time.

There is also a security and a performance implication: right now no user code is being run at compile time, only code that is already implemented in the compiler, so as long as there are no bugs in the compiler, it is currently safe to compile the whole code. Once we allow executing any random code at compile time, all kinds of things can happen:

gklimowicz commented 2 years ago

I may be an idiot, but how does using SIMPLE functions in initialization expressions work in a world of separate compilation? Only the interface may be known at compile time, not the implementation.

everythingfunctional commented 2 years ago

how does using SIMPLE functions in initialization expressions work in a world of separate compilation?

The expression isn't computed at compile time, it's computed at run time, prior to beginning execution of the main program.

everythingfunctional commented 2 years ago

In fact, it's a bit odd to me that compilers felt it acceptable that

real, parameter :: y = sin(1.0)
print *, y

and

real :: x, y

x = 1.0
y = sin(x)
print *, y

might produce different outputs, since one math library would be used at compile time, but a different one used at run time.

everythingfunctional commented 2 years ago

I did think of a case that a proposal will need to prevent, a named constant defined in terms of a simple function that references it. I.e.

module foo
  implicit none

  integer, parameter :: UH_OH = bar()
contains
  simple function bar()
    integer :: bar

    bar = UH_OH
  end function
end module
gklimowicz commented 2 years ago

Yeah, that was the sort of thing I was thinking.

certik commented 2 years ago

might produce different outputs, since one math library would be used at compile time, but a different one used at run time.

With optimizations enabled, as a user I would expect both to produce a compile time value, in this case identical value computed using the compile time (slow but more accurate) math library.

everythingfunctional commented 2 years ago

As a user, I'd expect the following program to always output "T", no matter what. But what are the actual chances of that? You can easily link to a math library that is different from the one used by the compiler. The choice of optimizations and link time options should not affect whether this program outputs "T" or "F". x = sin(1.0) should always be computed at program startup time, not compile time. If that's true, why can't user defined simple functions also be used there?

program huh
  implicit none
  real, parameter :: x = sin(1.0)
  real :: y

  y = 1.0
  print *, x == sin(y)
end program
certik commented 2 years ago

Regarding your last example, one approach could be that the compiler only uses the "more accurate" math library when optimizations are enabled, thus this should still return T in most simple cases. In the standard "Debug" mode, the compiler can call the exact same math library for both compile time and runtime, thus also returning T.

On a broader issue, I never compare floating point numbers directly like this, but always with abs(x - sin(y)) < eps, in which case this will return T no matter what.

everythingfunctional commented 2 years ago

the compiler can call the exact same math library for both compile time and runtime

The compiler doesn't know the runtime math library. I.e.

$ gfortran -c huh.f90 -o huh.o
$ ld huh.o -lmkl -lgfortran -o huh # I don't know exactly what all the necessary libraries are, but you get the idea
klausler commented 2 years ago

As a user, I'd expect the following program to always output "T", no matter what.

That expectation turns out to be impossible in the face of separate compilation and link steps.

everythingfunctional commented 2 years ago

That expectation turns out to be impossible in the face of separate compilation and link steps.

I think that's an artifact of current implementations, not a required implication of the standard. Nothing says that sin(1.0) must be evaluated at compile time, even in a constant expression. That calculation could be deferred to program startup and still be in conformance with the standard.

certik commented 2 years ago

As @klausler said, if you want separate compilation and link steps, and as you @everythingfunctional showed that you can choose the runtime library later, it's hard to do, except deferring the evaluation of sin to runtime, but the problem is, as @klausler's example above (real(kind=simplefunc()) :: x) shows, sometimes you need to evaluate it at compile time, you can't defer it to runtime. I do think in most practical cases, such as your floating point example, things can be done at runtime though, just not in all cases.

klausler commented 2 years ago
real(kind=merge(kind(0.d0), kind(0.), sin(1.0) < cos(1.0)) :: x
everythingfunctional commented 2 years ago
real(kind=merge(kind(0.d0), kind(0.), sin(1.0) < cos(1.0)) :: x

That one is at least definitely standards conforming. How about this one though?

real(kind=real_kinds(int(sin(1.5707963)))) :: x

Whether or not that is standards conforming depends on the quality of the implementation of the sin function used at compile time. IMHO the allowance of implementation dependent functions in constant expressions was a mistake. To allow that mistake to get in the way of adding a feature to the language that implies deferring calculations of some constants to program startup time is a further mistake in my mind.

klausler commented 2 years ago

selected_real_kind is implementation-dependent, but I never see it used outside a constant expression that's necessary for typing at compilation time.

It is a strength of Fortran that nearly every intrinsic function, including most functions in the IEEE intrinsic modules, is available for use in constant expressions. It takes a huge amount of effort to implement them fully -- see f18's here for an example that's nearly complete -- but it gives Fortran one of its few advantages over C++.

everythingfunctional commented 2 years ago

selected_real_kind is implementation-dependent, but I never see it used outside a constant expression that's necessary for typing at compilation time.

It seems I keep making broad generalizations where I should be more reserved and nuanced. I do appreciate all the insights this discussion has elicited.

It seems defining what I'm after will require more effort than I initially anticipated, but I'm still convinced much of it should be technically feasible and desirable. Being able to write what I have in my initial example is still something worth working towards I think.

certik commented 2 years ago

@everythingfunctional if you are ok with manually inlining your functions, it looks like you can already do quite a bit at compile time: https://fortran-lang.discourse.group/t/computing-at-compile-time/3044

everythingfunctional commented 2 years ago

That is a cool demonstration, but it still lacks two features I'd like.

  1. to be able to reuse somebody's existing library without having to manually inline all their calculations
  2. be able to define constants of derived types with private and/or allocatable components
klausler commented 2 years ago
  1. to be able to reuse somebody's existing library without having to manually inline all their calculations

Allowing statement functions in constant expressions and modules would cover most of the use cases for that need, and would be way easier to implement than a constexpr function. Yes, I know about statement functions being obsolescent, but they've been in the language since literally day 1 (longer than functions and subroutines) and aren't going anywhere.

  1. be able to define constants of derived types with private and/or allocatable components

What's wrong with using a function from the type's definition module to construct and return these, other than being unable to reference that function from an initialization expression?

everythingfunctional commented 2 years ago

other than being unable to reference that function from an initialization expression

that's exactly what I'm after

Allowing statement functions in constant expressions and modules

That's an interesting idea, but are statement functions pure or simple? If you could designate them simple, I think that could be workable.

klausler commented 2 years ago

Statement functions can be considered pure if they reference only pure functions. They're just wrappers around expressions, and contain no statements per se. They can similarly also be considered simple if they reference no variables other than their arguments. Either way, it's a trivial derived attribute.

klausler commented 2 years ago

other than being unable to reference that function from an initialization expression

that's exactly what I'm after

Initializers of variables, default initializers of components, or both? They're not exactly the same problem, and may have distinct solution options.

everythingfunctional commented 2 years ago

Both. But I think allowing it in default initialization of components requires at least thinking about the context of initialization of variables. For example

type :: foo
  integer :: bar = baz()
end type
type(foo), parameter :: buzz = foo()

Ordinarily the intrinsic structure constructor has optional arguments for components with default initializers, and they can be used to initialize a named constant of that type. So by allowing simple functions to be used in default initialization of components, you're kind of forced to allow them for named constants by proxy. That or carve out a weird exception for types who's default initializers aren't constant expressions.

I think a spec like: An initialization expression may contain a constant expression, or simple functions with actual arguments that are themselves initialization expressions. An initialization expression can be used to define the value of a named constant, the initial value of a saved variable or the default value of a component of a derived type. A named constant whose value is not a constant expression may not be used in a constant expression.

That last statement is the weird bit with strange complications/implications, but gets around the problems in real(kind=simplefunc()) :: x because the kind parameter still requires a constant expression.

There's probably still some problematic aspects I haven't considered somewhere in there, but it seems like the right direction.

everythingfunctional commented 2 years ago

Just wanted to share experiences from the C++ world involving this kind of stuff. https://youtu.be/OcyAmlTZfgg

gklimowicz commented 1 year ago

I just wanted to add a note about implementation considerations for constexpr-like functions in Fortran.

I did a bit of research with some C++ aficionados about the implementation of constexpr. It sounds like all the implementations evaluate the function calls at compile time using an interpreter that can emulate almost all of the C++ language. There are a few remaining restrictions (related to calling non-constexpr functions and variables, allocating memory that is not freed in the same call, throwing exceptions and such).

I believe for Fortran modules, it requires placing an interpretable version of the function in the .mod module file for later use and call by code compiled elsewhere. (I don't think we can call the compiled function, as we probably don't know where it is.)

I believe implementors will say the investment is quite high, and the return on that investment is not well-motivated yet.

certik commented 1 year ago

I think it can be done in the compiler. The question is more if users want it, as well as what are the guidelines of features that we should not put into Fortran (every new feature has a "cost", etc.).