Open certik opened 2 years ago
CC @Thirumalai-Shaktivel, @Smit-create, @namannimmo10, @czgdp1807, @haozeke
Let me know what you think.
The only issue I have with this is that we might not be generalizing sufficiently enough for backend optimizations to take place. Converting allocatable into pseudo-SSA based explicit instructions at the ASR level could possibly preclude backend optimizers expecting the allocatable form.
Off the top of my head I can't think of any examples where this would happen, but we should try in general to not have to re-implement backend optimizations on the ASR.
From the perspective of keeping a very general ASR after the AST->ASR transform, 3+2 looks better, since it would allow for the ASR to ASR pass to be disabled per backend. AFAIK mem2reg
is IR->IR as well.
Yes, we should not convert all allocatables into static arrays. Only as optional optimization passes. The question is, whether in AST to ASR, we should convert an AST assignment like x = [1, 2, 3]
into the following ASR code:
if (allocated(x)) then
if (size(x) != 3) then
deallocate(x)
allocate(x(3))
end if
else
allocate(x(3))
end if
x(1:3) = [1, 2, 3] # just copy, assume the same size and allocated
or
x = [1, 2, 3] # reallocate and copy, do not assume the same size and allocated
And the first feels much better, as it makes ASR always nice and explicit regarding the assignment.
In Python, we can have:
y = [1, 2, 3]
x = y # here CPython increases the reference count of the `[1, 2, 3]` list, and both `y` and `x` point to the same memory with refcount equal to 2
In Python:
s = "abc" + s
So the problem is what to do if the right hand side contains an expression that calls functions (that can have side effects, perhaps writing to a file, or modifying global state), such as:
x = f()
If the function f()
has no side effects, then it would be equivalent to:
integer :: n
n = len(f())
if (allocated(x)) then
if (size(x) != n) then
deallocate(x)
allocate(x(n))
end if
else
allocate(x(n))
end if
x(1:n) = f() # just copy, assume the same size and allocated
But even without side effects, this would call the function f
twice, so if it is a long running function, this would be inneficient. And tough to optimize by optimizers too.
And finally, if f
has side effects, then the above is incorrect.
As such, it seems in the general case this cannot even be represented in ASR in any other way than in extending the Assignment
node by a new argument bool realloc_lhs
, if false
then just copy, if true
, then reallocate the lhs according to the logic above, once you know the size of the (runtime) array on the rhs.
So:
x = f() # Assignment(realloc_lhs=True)
x(1:n) = f() # Assignment(realloc_lhs=False)
This sounds good (flagging the nodes with realloc_lhs
)
To be determined: how to handle automatic reallocation of the LHS, for things like:
And the same for strings:
The two options:
s
is allocated and deallocate it, such as:We already have
ImplicitDeallocate
which I think does exactlyif (allocated(x)) then deallocate(x)
. And then we can add optimizations in ASR->ASR that would turn this into:And ultimately we need optimizations to turn this into:
Depending on how the arrays are used. These optimizations we have to have anyway, to simplify such generic code into a faster code.
The other optimization that we need is that sometimes 3 and 5 is not known at compile time, but the sizes are still known as some kind of an expression, in the simplest case as variables
n
andm
. Then this:should be turned into:
Handle this by the backends. That means we have to implement this logic into every backend. That seems suboptimal, the approach we chose is to encode everything into ASR explicitly, so that the backend doesn't have to handle any kind of such logic.
Write an ASR to ASR pass that tranforms the initial code into the code in 1. So far we chose to do this directly in AST to ASR, so that ASR is always well formed and explicit.
So it seems 1. is the way to go.
It seems this is analogous to the
mem2reg
pass in LLVM, where the frontend is encouraged to usealloca
for everything and then the LLVM optimizers promote it to registers if they can. It seems it is possible to implement such optimizers in a robust way. The above optimizers turn "allocatable" into just regular "non-allocatable" arrays. And the AST to ASR just always turnsx = y
assignment wherex
is allocatable into theImplicitDeallocate; Allocate; Assignment
nodes.So ASR
Assignment
only does a copy (to either allocatable or non-allocatable arrays), and it assumes that the size of the LHS is the same as RHS. The frontend checks this either at compile time or runtime (in Debug/Check mode).Assignment
does not resize.