lcompilers / lpython

Python compiler
https://lpython.org/
Other
1.5k stars 164 forks source link

How to handle automatic reallocation of the LHS #191

Open certik opened 2 years ago

certik commented 2 years ago

To be determined: how to handle automatic reallocation of the LHS, for things like:

real(dp), allocatable :: x(:)
x = [1, 2, 3]
x = [1, 2, 3, 5, 6]

And the same for strings:

character(:), allocatable :: s
s = "abc"
s = "abcdef"

The two options:

  1. The ASR will contain implicit nodes for checking if s is allocated and deallocate it, such as:
    real(dp), allocatable :: x(:)
    if (allocated(x)) then deallocate(x)
    allocate(x(3))
    x = [1, 2, 3]
    if (allocated(x)) then deallocate(x)
    allocate(x(5))
    x = [1, 2, 3, 5, 6]

    We already have ImplicitDeallocate which I think does exactly if (allocated(x)) then deallocate(x). And then we can add optimizations in ASR->ASR that would turn this into:

    real(dp), allocatable :: x(:)
    allocate(x(3))
    x = [1, 2, 3]
    deallocate(x)
    allocate(x(5))
    x = [1, 2, 3, 5, 6]

    And ultimately we need optimizations to turn this into:

    real(dp) :: x1(3), x2(5)
    x1 = [1, 2, 3]
    x2 = [1, 2, 3, 5, 6]

    Depending on how the arrays are used. These optimizations we have to have anyway, to simplify such generic code into a faster code.

The other optimization that we need is that sometimes 3 and 5 is not known at compile time, but the sizes are still known as some kind of an expression, in the simplest case as variables n and m. Then this:

integer :: m, n
real(dp), allocatable :: x(:)
call assign_to_mn(m, n) ! reads from an input file
allocate(x(m))
x = 1
deallocate(x)
allocate(x(n))
x = 2

should be turned into:

function f(m, n)
integer, intent(in) :: m, n
real(dp) :: x1(m), x2(n)
x1 = 1
x2 = 2
end function

integer :: m, n
call assign_to_mn(m, n) ! reads from an input file
call f(m, n)
  1. Handle this by the backends. That means we have to implement this logic into every backend. That seems suboptimal, the approach we chose is to encode everything into ASR explicitly, so that the backend doesn't have to handle any kind of such logic.

  2. Write an ASR to ASR pass that tranforms the initial code into the code in 1. So far we chose to do this directly in AST to ASR, so that ASR is always well formed and explicit.

So it seems 1. is the way to go.

It seems this is analogous to the mem2reg pass in LLVM, where the frontend is encouraged to use alloca for everything and then the LLVM optimizers promote it to registers if they can. It seems it is possible to implement such optimizers in a robust way. The above optimizers turn "allocatable" into just regular "non-allocatable" arrays. And the AST to ASR just always turns x = y assignment where x is allocatable into the ImplicitDeallocate; Allocate; Assignment nodes.

So ASR Assignment only does a copy (to either allocatable or non-allocatable arrays), and it assumes that the size of the LHS is the same as RHS. The frontend checks this either at compile time or runtime (in Debug/Check mode). Assignment does not resize.

certik commented 2 years ago

CC @Thirumalai-Shaktivel, @Smit-create, @namannimmo10, @czgdp1807, @haozeke

Let me know what you think.

HaoZeke commented 2 years ago

The only issue I have with this is that we might not be generalizing sufficiently enough for backend optimizations to take place. Converting allocatable into pseudo-SSA based explicit instructions at the ASR level could possibly preclude backend optimizers expecting the allocatable form.

Off the top of my head I can't think of any examples where this would happen, but we should try in general to not have to re-implement backend optimizations on the ASR.

From the perspective of keeping a very general ASR after the AST->ASR transform, 3+2 looks better, since it would allow for the ASR to ASR pass to be disabled per backend. AFAIK mem2reg is IR->IR as well.

certik commented 2 years ago

Yes, we should not convert all allocatables into static arrays. Only as optional optimization passes. The question is, whether in AST to ASR, we should convert an AST assignment like x = [1, 2, 3] into the following ASR code:

if (allocated(x)) then
    if (size(x) != 3) then
        deallocate(x)
        allocate(x(3))
    end if
else
    allocate(x(3))
end if
x(1:3) = [1, 2, 3] # just copy, assume the same size and allocated

or

x = [1, 2, 3] # reallocate and copy, do not assume the same size and allocated

And the first feels much better, as it makes ASR always nice and explicit regarding the assignment.

certik commented 2 years ago

In Python, we can have:

y = [1, 2, 3]
x = y # here CPython increases the reference count of the `[1, 2, 3]` list, and both `y` and `x` point to the same memory with refcount equal to 2
certik commented 2 years ago

In Python:

s = "abc" + s
certik commented 2 years ago

So the problem is what to do if the right hand side contains an expression that calls functions (that can have side effects, perhaps writing to a file, or modifying global state), such as:

x = f()

If the function f() has no side effects, then it would be equivalent to:

integer :: n
n = len(f())
if (allocated(x)) then
    if (size(x) != n) then
        deallocate(x)
        allocate(x(n))
    end if
else
    allocate(x(n))
end if
x(1:n) = f() # just copy, assume the same size and allocated

But even without side effects, this would call the function f twice, so if it is a long running function, this would be inneficient. And tough to optimize by optimizers too.

And finally, if f has side effects, then the above is incorrect.

As such, it seems in the general case this cannot even be represented in ASR in any other way than in extending the Assignment node by a new argument bool realloc_lhs, if false then just copy, if true, then reallocate the lhs according to the logic above, once you know the size of the (runtime) array on the rhs.

certik commented 2 years ago

So:

x = f() # Assignment(realloc_lhs=True)
x(1:n) = f() # Assignment(realloc_lhs=False)
HaoZeke commented 2 years ago

This sounds good (flagging the nodes with realloc_lhs)