modern-fortran / neural-fortran

A parallel framework for deep learning
MIT License
403 stars 83 forks source link

CAF preprocessor macro #50

Closed rouson closed 2 years ago

rouson commented 2 years ago

@milancurcic What's the purpose of the following code mod_mnist.f90?

#ifdef CAF
      call co_sum(db(n) % array)
#endif

I'm unclear on whether it's designed to support serial builds (in which case it seems unnecessary) or designed to support compilers that don't support co_sum, in which case I'm curious about which compilers and the motivation for supporting them. Intel, Cray, NAG, and GNU all support co_sum.

Based on the principle of least surprise, I recommend making standard-conforming behavior the default behavior and using a macro to turn off standard behavior rather than requiring a macro to turn on standard behavior.

I also recommend wrapping the entire subroutine in the #ifdef block and similarly wrapping any references to it. Otherwise, someone reading call db_co_sum(...) elsewhere is going to be surprised when they realize they have to pass a special flag to get the procedure to do what its name implies. This is one of the subtleties that I was mentioning regarding the conceptual benefit of separating procedure definitions from the corresponding interface bodies. Doing so discourages hiding gotchas in the procedure definition.

milancurcic commented 2 years ago

I think you mean modlayer.f90. You're correct, it's meant to support serial builds. I assume that recent GFortran can compile this now, but earlier versions could not, so removing the calls to `co*` in absence of OpenCoarrays was necessary.

db_co_sum is never called in serial runs. The preprocessor macro hack is there only to make the compiler happy. It does not affect the logic of the program.

I just tried with GFortran 9.4.0 (default on Ubuntu 20.04 LTS) and it's able to compile collective subroutine calls with -fcoarray=single, so I'm OK with removing this macro, it doesn't seem to serve a purpose anymore.

rouson commented 2 years ago

@milancurcic I just noticed the co_broadcast calls. I'll need to do some testing. Because all components of the involved derived types are allocatable, I'm pretty sure your code will work. If you ever need to call co_broadcast with an actual argument that contains a mix of allocatable and non-allocatable components, then gfortran 11 or 12 will be required because of a bug that an Archaeologic contractor just fixed two months ago.