modern-fortran / neural-fortran

A parallel framework for deep learning
MIT License
398 stars 82 forks source link

Refactor for convnets #58

Closed milancurcic closed 2 years ago

milancurcic commented 2 years ago

The original neural-fortran code was limited in application because the network type was hardcoded for dense (fully-connected) layers. This PR introduces a large refactor of the library to allow extending it to other network architectures (convolutional for imagery and model data, recurrent for time series, etc.).

Key changes:

What's not there anymore:

A nice side-effect of this refactor is that the MNIST training example is about 135% (2.35 times) faster than the original code. This is likely due to the fact that this time around I was careful about minimizing copies and re-allocations. This result is with ifort-2021.3 using -Ofast on Intel E5-1650.

Known issues:

TODO before merging:

CC @katherbreen

milancurcic commented 2 years ago

Known issues:

With higher optimization levels on GFortran (anything above -O0), the network does not converge as expected, and this is true for all 3 included examples. For example, the MNIST example it reaches high 80% in one epoch and then slowly drops in subsequent epochs. Same behavior with 9.4.0 and 10.3.0. This issue goes away with -O0, and doesn't appear at any optimization level with ifort. I hope to diagnose and resolve this before the merge.

Adding -fno-frontend-optimize allows GFortran to generate code that runs correctly (examples converge) at any optimization level, including -Ofast. So, -ffrontend-optimize, implied for any optimization level above -O0, seems to cause the issue. I don't know exactly why yet. From the GFortran manual:

       -ffrontend-optimize
           This option performs front-end optimization, based on manipulating parts the Fortran parse tree.  Enabled by default
           by any -O option except -O0 and -Og.  Optimizations enabled by this option include:

           *<inlining calls to "MATMUL",>
           *<elimination of identical function calls within expressions,>
           *<removing unnecessary calls to "TRIM" in comparisons and assignments,>
           *<replacing TRIM(a) with "a(1:LEN_TRIM(a))" and>
           *<short-circuiting of logical operators (".AND." and ".OR.").>

           It can be deselected by specifying -fno-frontend-optimize.

Of these, inlining calls to "MATMUL" and elimination of identical function calls within expressions seem like candidates for the cause of the issue. I don't know if this list of optimizations is a complete list or a subset.