lfortran / lfortran

Official main repository for LFortran
https://lfortran.org/
Other
930 stars 145 forks source link

ASR idea: move dimension expressions from Array to Variable #4244

Open certik opened 3 months ago

certik commented 3 months ago

Here is the current Array ttype definition:

ttype
    ...
    | Array(ttype type, dimension* dims, array_physical_type physical_type)

dimension = (expr? start, expr? length)

And a variable of type Array is represented by a Variable symbol node:

symbol
    ...
    | Variable(symbol_table parent_symtab, identifier name, identifier* dependencies, intent intent, expr? symbolic_value, expr? value, storage_type storage, ttype type, symbol? type_declaration, abi abi, access access, presence presence, bool value_attr)

The dimension expression can depend on other function parameters (which are represented by FunctionParam), intrinsic functions and also user functions. The user functions must be pure, but they can read global module variables (so they are side-effects-free, but not deterministic), possibly even parent function scope variables. So they are tied to a scope.

If we move the expressions for dimension into Variable, then we can get to them via ExternalSymbol, but the type itself would only contain the number of dimensions, and the type of dimensions is already determined by physical_type.

This design might simplify handling of array types. Regarding type checking, it seems we can't really check the expressions at compile time anyway, except in special cases. So if we remove the expressions from the type, then the type can be fully checked (number of dimensions, physical type, element type), and then the rest can in general only be checked at runtime, but we can insert various optional compile time checks, that would have to go into the symbol table.

Since this might be quite a big change, we can probably do it after beta. Until then we can think about it, if it is a good idea.

Also: currently in functions we can declare an input parameter integer, intent(in) :: A(10,10+n-1), but pass to it integer :: A(1,n). The current Array ttype does not agree with the expressions. So we don't change those anyway. So conceptually it makes sense to remove the expressions from a ttype, and move them into the Variable symbol. The bounds checking, or in this case, array argument passing lengths, must agree at runtime, but it's a fundamentally runtime check, while ttypes should be possible to fully check at compile time. So another argument to remove dimension expressions from ttype.

rebcabin commented 3 months ago

Do you already have full run-time bounds checking enabled in Debug mode? If so, then I would be in favor of a move that simplifies the code. Looks like the simplification you propose is to move the locus of dynamic array-size specifications from the ttype for arrays (in the dimension* slot) to the Variable position. It seems to me more natural in the Variable position because it would permit a Variable with statically known dimensions and a Variable with only dynamically known dimensions to be of the same ttype. That might broaden ore ease ASR's applicability to more dynamic situations that arise in, say, C++ or Julia.

certik commented 2 months ago

There is one counter argument: for functions like my_matmul(m, k, n, A, B, C) where A(m,k), etc., we want the compiler to be (eventually) checking the consistency of calls like call my_matmul(16, 32, 64, X, Y, Z) with the dimensions of X, Y, Z, if available at compile time. It's not possible in general, so it is fundamentally a runtime feature. At runtime in Debug mode we need to check it anyway (we currently don't have it implemented yet).

I think if a feature/checking is fundamentally runtime, then it does not belong into the type, which is fundamentally compile time.

We can still be doing various compile time checking even if it is not part of the type, just via the symbol table. It's not closing the door to it, but it greatly simplifies the type system and related code (no need for FunctionParam, etc.).

rebcabin commented 2 months ago

I don't buy the premise that types are fundamentally a compile-time notion. An interpreter for a strongly, statically typed language like Java or C# should, could, and would do all the type-checking that a compiler would normally do!

On Fri, Jun 14, 2024 at 10:46 AM Ondřej Čertík @.***> wrote:

There is one counter argument: for functions like my_matmul(m, k, n, A, B, C) where A(m,k), etc., we want the compiler to be (eventually) checking the consistency of calls like call my_matmul(16, 32, 64, X, Y, Z) with the dimensions of X, Y, Z, if available at compile time. It's not possible in general, so it is fundamentally a runtime feature. At runtime in Debug mode we need to check it anyway (we currently don't have it implemented yet).

I think if a feature/checking is fundamentally runtime, then it does not belong into the type, which is fundamentally compile time.

We can still be doing various compile time checking even if it is not part of the type, just via the symbol table. It's not closing the door to it, but it greatly simplifies the type system and related code (no need for FunctionParam, etc.).

— Reply to this email directly, view it on GitHub https://github.com/lfortran/lfortran/issues/4244#issuecomment-2168492234, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABSRR4VB74WFGJMK4CDMYTZHMT6ZAVCNFSM6AAAAABJGWX4USVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCNRYGQ4TEMRTGQ . You are receiving this because you commented.Message ID: @.***>

certik commented 2 months ago

After further discussion with @rebcabin, it seems the current design of having the expression part of the type might be optimal: if possible to check at compile time, then it should be done; otherwise the check will be at runtime in Debug mode.