modern-fortran / neural-fortran

A parallel framework for deep learning
MIT License
398 stars 82 forks source link

Refactor: move procedure definitions submodules #51

Closed rouson closed 2 years ago

rouson commented 2 years ago

This PR incorporates the suggestions from @milancurcic in PR #47 and replaces that PR, which is now closed.

This PR separates procedure interface bodies (in modules) from procedure definitions (in submodules) for all modules except mod_kinds.f90, which contains no procedures.

Benefits

  1. Reduces the information a user encounters when attempting to use a module.
  2. Clarifies what parts of the code comprise the API versus of the implementation:
    • A procedure's local variables no longer appear in the module.
    • A non-public module entity (e.g., pi) no longer appears in the module.
    • Entities used in a procedure definition but not in the corresponding interface body are use-associated in the relevant submodule but not the parent module.

TODO prior to merging this PR

milancurcic commented 2 years ago

Thanks a lot, this is great. I have a few questions:

  1. This builds and runs correctly with GFortran-10.3.0. However, with GFortran-9.4.0, I get:

    FPM_FC=gfortran fpm test --flag "-cpp -O3 -ffast-math -fcoarray=single"
    ...
    + gfortran -c ./src/mod_parallel_submodule.f90  -cpp -O3 -ffast-math -fcoarray=single -J build/gfortran_C8F06E9782581751 -Ibuild/gfortran_C8F06E9782581751 -o build/gfortran_C8F06E9782581751/neural-fortran/src_mod_parallel_submodule.f90.o
    ./src/mod_parallel_submodule.f90:10:31:
    
    10 |     integer(ik) :: tile_indices(2)
      |                               1
    Error: Duplicate DIMENSION attribute specified at (1)

    It's unclear to me why the compiler throws an error here. It doesn't seem to be about duplicate interfaces in the module and submodule, because a number of other duplicate interfaces compile fine (e.g. activation, random, etc.).

  2. With ifort-2021.5.0:

FPM_FC=ifort fpm test --flag "-cpp -O3"
...
 + ifort -c ././src/mod_layer_submodule.f90  -cpp -O3 -module build/ifort_97A691C73DDCB532 -Ibuild/ifort_97A691C73DDCB532 -o build/ifort_97A691C73DDCB532/neural-fortran/src_mod_layer_submodule.f90.o
././src/mod_layer_submodule.f90(9): error #6404: This name does not have a type, and must have an explicit type.   [LAYER]
  type(layer_type) module function constructor(this_size, next_size) result(layer)
----------------------------------------------------------------------------^
././src/mod_layer_submodule.f90(19): error #6404: This name does not have a type, and must have an explicit type.   [A]
  pure type(array1d) module function array1d_constructor(length) result(a)
------------------------------------------------------------------------^
././src/mod_layer_submodule.f90(25): error #6404: This name does not have a type, and must have an explicit type.   [A]
  pure type(array2d) module function array2d_constructor(dims) result(a)
----------------------------------------------------------------------^
compilation aborted for ././src/mod_layer_submodule.f90 (code 1)

Again, this code looks fine and I don't understand why the compiler is not happy with it. Declaring the result type on a separate line doesn't seem to make a difference. Do you have any clues?

While being able to build this with earlier GFortran (item 1) is not essential, building with ifort is (item 2).

milancurcic commented 2 years ago

If I move the type constructor (layer and network) functions from submodules back to modules, ifort and ifx can compile and run.

@rouson are you open to this one exception?

rouson commented 2 years ago

@rouson are you open to this one exception?

Yes but I have some other ideas that I'd like to investigate this morning. There's a chance that some of the code in the PR is not standard-conforming. I'll provide more detail shortly and will push any required fixes.

rouson commented 2 years ago

@milancurcic I have a workaround for the Intel compiler issue. I'll push it soon. Fixing the issue was a little cumbersome because of the need to change things in both the interface body and the procedure definition to keep the redundant information consistent, which is why I prefer to not repeat the information. When I need to see both the interface information and the procedure definition, I open side-by-side screens.

I think I know what caused the problem and I hope to find the time to submit a bug report to Intel. Neural-fortran is the first code that I've seen where the result is given an explicit name other than the procedure and the result type is declared in the function prefix. (More commonly, when I see a result(a) statement, there's a separate statement declaring the result type.) I believe the original code was standard-conforming, but I suspect that the subset of programs that use that style and have module in the prefix is sufficiently small that no one else has reported this issue to Intel and they must not have similar code in their test suite.

milancurcic commented 2 years ago

I think I know what caused the problem and I hope to find the time to submit a bug report to Intel. Neural-fortran is the first code that I've seen where the result is given an explicit name other than the procedure and the result type is declared in the function prefix. (More commonly, when I see a result(a) statement, there's a separate statement declaring the result type.) I believe the original code was standard-conforming, but I suspect that the subset of programs that use that style and have module in the prefix is sufficiently small that no one else has reported this issue to Intel and they must not have similar code in their test suite.

That's fascinating, great detective work. I often use result(a) and type definition on the same line with no issues, but indeed, I've never used submodules before and your explanation is reasonable.

rouson commented 2 years ago

@milancurcic this commit is ready to merge modulo the TODO that I believe you added in the initial comment. I'll leave those tasks to you if that's ok.

milancurcic commented 2 years ago

Great, thank you. Yes, I'll tackle the TODOs, they're chores.

milancurcic commented 2 years ago

This builds and runs correctly with GFortran-10.3.0. However, with GFortran-9.4.0, I get:


FPM_FC=gfortran fpm test --flag "-cpp -O3 -ffast-math -fcoarray=single"
...
 + gfortran -c ./src/mod_parallel_submodule.f90  -cpp -O3 -ffast-math -fcoarray=single -J build/gfortran_C8F06E9782581751 -> Ibuild/gfortran_C8F06E9782581751 -o build/gfortran_C8F06E9782581751/neural-fortran/src_mod_parallel_submodule.f90.o
./src/mod_parallel_submodule.f90:10:31:

   10 |     integer(ik) :: tile_indices(2)
      |                               1
Error: Duplicate DIMENSION attribute specified at (1)

@rouson While working on #58 I found that if the function result is declared with an alternative name (using e.g. result(res)), then this error goes away. Thanks to this workaround #58 uses submodules everywhere (well, everywhere except one frivolous case), and GFortran 9.4.0 builds it without complaining.

rouson commented 2 years ago

@milancurcic that's great news!