Closed milancurcic closed 2 years ago
Known issues:
With higher optimization levels on GFortran (anything above -O0), the network does not converge as expected, and this is true for all 3 included examples. For example, the MNIST example it reaches high 80% in one epoch and then slowly drops in subsequent epochs. Same behavior with 9.4.0 and 10.3.0. This issue goes away with -O0, and doesn't appear at any optimization level with ifort. I hope to diagnose and resolve this before the merge.
Adding -fno-frontend-optimize
allows GFortran to generate code that runs correctly (examples converge) at any optimization level, including -Ofast
. So, -ffrontend-optimize
, implied for any optimization level above -O0
, seems to cause the issue. I don't know exactly why yet. From the GFortran manual:
-ffrontend-optimize
This option performs front-end optimization, based on manipulating parts the Fortran parse tree. Enabled by default
by any -O option except -O0 and -Og. Optimizations enabled by this option include:
*<inlining calls to "MATMUL",>
*<elimination of identical function calls within expressions,>
*<removing unnecessary calls to "TRIM" in comparisons and assignments,>
*<replacing TRIM(a) with "a(1:LEN_TRIM(a))" and>
*<short-circuiting of logical operators (".AND." and ".OR.").>
It can be deselected by specifying -fno-frontend-optimize.
Of these, inlining calls to "MATMUL" and elimination of identical function calls within expressions seem like candidates for the cause of the issue. I don't know if this list of optimizations is a complete list or a subset.
The original neural-fortran code was limited in application because the network type was hardcoded for dense (fully-connected) layers. This PR introduces a large refactor of the library to allow extending it to other network architectures (convolutional for imagery and model data, recurrent for time series, etc.).
Key changes:
input
).nf_
instead ofmod_
, to minimize the chance of name clashes with other libraries that may enter the same namespace in a user application.What's not there anymore:
real64
orreal128
. Rationale: Not too useful to begin with, and can be easily added if anybody asks for it.save
andload
methods to save and load pre-trained networks. Rationale: we'll be adding support for HDF5 I/O soon, and I assume most people who usedsave
andload
did it via FKB rather than the upstream neural-fortran.A nice side-effect of this refactor is that the MNIST training example is about 135% (2.35 times) faster than the original code. This is likely due to the fact that this time around I was careful about minimizing copies and re-allocations. This result is with ifort-2021.3 using
-Ofast
on Intel E5-1650.Known issues:
-O0
), the network does not converge as expected, and this is true for all 3 included examples. For example, the MNIST example it reaches high 80% in one epoch and then slowly drops in subsequent epochs. Same behavior with 9.4.0 and 10.3.0. This issue goes away with-O0
, and doesn't appear at any optimization level with ifort. I hope to diagnose and resolve this before the merge.TODO before merging:
CC @katherbreen