stfc / PSyclone

Domain-specific compiler and code transformation system for Finite Difference/Volume/Element Earth-system models in Fortran
BSD 3-Clause "New" or "Revised" License
103 stars 25 forks source link

Add pointer/target/pointer_assignment support #2577

Open sergisiso opened 2 months ago

sergisiso commented 2 months ago

Due to popular demand, I think we should reconsider adding support for pointers. I have usually been reticent because they add many complications to properly validating transformations and do correct dependency analysis and I was hoping that the CodeBlocks and UnsupportedTypes provided some barrier/protection for these cases. But in practice:

The assignment is nothing special, it could be an attribute to Assignment: Assignment.create(lhs=expr, rhs=expr, is_pointer=True)

This will remove the CodeBlocks and defer this specific validity checks to analysing the symbols which we need anyway (fewer special cases).

The problem with pointer and target attributes is that is possible to alias memory. This is also true for UnsupportedType and UnresolvedType (and some interfaces?) - we need to assume the worst case.

real, dimension(:), target :: a
real, dimension(:), pointer :: b => a

But it would be cumbersome to add all these type checks all over the place. So, we also need a new property in Symbol (or Type?) that does all this checking:

def is_potentially_aliased():
    return symbol.is_pointer or symbol.is_target or symbol.is_unsupported or symbol.is_unresolved

and I think it will also make it less likely to forget because the name is more relevant to the problem.

Then we need to modify the DependencyTools, reference_accesses, forward_reference with this information (which now we get wrong it?).

In:

do i = 1, 10
    a(i) = b(i-1)
end do

the is_loop_independent should return false (also if a and be are simply imported symbols)

In:

a = 3
c = b

the stmt[0].lhs.next_access / reference_access should return stmt[1].rhs

This is what we need when deciding if we need a data_movement directive, a halo_exchange, a omp variable not being private, ... (if there use cases for just caring about the symbol, and not dependencies, we could add an attribute or similar alternative properties but I am usure there are)

This will impact lfric, where all fields are pointers (but we still not use the general psyir transformations and when we do it, we can specialise the Proxys and other symbols to def is_potentially_aliased(): return False if that is guaranteed by LFRic).

It will also impact NEMO, where all arrays are declared in an external module. This can be fixed by adding the -I argument and calling resolve_imports (fortunately all fields come from a small set of modules) and has been proved to work for the array_range, but some work until we recover the full coverage is expected.

We could divide this in three tasks: 1) Pointer_assingments: should be the easiest, but we need to make sure that things that were protected by CodeBlocks are now also protected by its use of an UnsupportedType.

2) pointer, target and use is_potentially_aliased. As a first step we could replace the current use of UnsupportedType with the new attribute. So it has the same behaviour (including the currently unsafe transformations)

3) Improve the multiple dependency methods for the general case (+perf mitigations for the apps: resolve external imports, domain symbol specialisations, and transformation with options={force:True})

This is quite a bit of work, what do @arporter, @hiker and @AidanChalk think?

sergisiso commented 2 months ago

I should also mention that this is not a priority. I would do 1) because it is useful when generating LFRic with all initialisations being proper PSyIR (half are pointer assignments) and Grenoble has also mentioned it. But 2 and 3 will be dependent on the most pressing issues found in the multiple NG-ARCH codes. This issue is to have a discussion/plan and a place to track the progress.

LonelyCat124 commented 2 months ago

I think 1 is reasonable to try to get in when we can, though we need to probably check Fortran POINTER and OpenMP clause interactions (though as you say, maybe we already break rules here in the worst case for unsupported types). I think I would probably ask to disable them inside task directives to avoid any headaches there as well.

Do we need to pass is_pointer to the assignment as an argument? Surely the assignment can work it out itself - accessing Assignment.lhs.symbol.is_pointer and not isinstance(Assignment.lhs, ArrayReference) should solve the issue most of the time? I don't know if you can have pointers to pointers in Fortran but that seems like a whole separate mess.

sergisiso commented 2 months ago

Do we need to pass is_pointer to the assignment as an argument? Surely the assignment can work it out itself

The problems is that lhs (and rhs) can be CodeBlocks or symbols to UnresolvedType themself

LonelyCat124 commented 2 months ago

Ah yeah I hadn't considered that.

arporter commented 2 months ago

I should also mention that this is not a priority. I would do 1) because it is useful when generating LFRic with all initialisations being proper PSyIR (half are pointer assignments) and Grenoble has also mentioned it. But 2 and 3 will be dependent on the most pressing issues found in the multiple NG-ARCH codes. This issue is to have a discussion/plan and a place to track the progress.

I fully agree with this. We're going to have to be careful with prioritising developments in order to deliver NG-Arch.

JulienRemy commented 1 week ago

Hi @sergisiso! As we use quite a lot of them, I took the liberty to start adding support for pointer and target in DataSymbol (it felt a lot easier and cleaner to me than in datatypes), StructureType.ComponentType and both the Fortran frontend and backend, see branch 2577_pointer_target_attrs if you get back to this at some point. Tests have not been written yet and existing ones not edited (but it's only 16 fails) as I have not yet implemented is_potentially_aliased.