nedtaylor / athena

A Fortran-based feed-forward neural network library. Whilst this library currently has a focus on 3D convolutional neural networks (CNNs), it can handle most standard hidden layer forms of neural networks, with the plan to integrate more.
MIT License
18 stars 2 forks source link

[PROPOSAL] Residual block adding layer #47

Open nedtaylor opened 1 month ago

nedtaylor commented 1 month ago

Reasoning

Skip layers help to mitigate vanishing gradient problem. They are a fundamental part of many modern neural networks.

Prior Art

(https://www.tensorflow.org/api_docs/python/tf/keras/layers/Add)

Tensorflow implementation of residual block: https://github.com/christianversloot/machine-learning-articles/blob/main/how-to-build-a-resnet-from-scratch-with-tensorflow-2-and-keras.md

Additional information

No response

nedtaylor commented 1 month ago

This has been implemented as of df2e707452b832ba4dbd107e17d16f67a5b2ef70 and it reproduces the same result as before for the mnist example. From my initial testing, this implementation has resulted in no speed reduction to the code for the mnist example.

Notes 1) this has temporarily removed the ability to provide skip inputs (previously implemented as addit_input), but the framework is there to accept any number of input layers that can feed in at multiple points to the network, so should be even better implementation. 2) Convolutional layers still allow padding as before, but this NEEDS to be handled better (via the specific layer) instead of changing the prior inputs. Issue #6 has been open for some time with the plan to resolve this.

nedtaylor commented 1 month ago

With the initial implementation showing good results, I am going to move this to an intended feature of version 2 as it works well alongside the change in inputs and outputs #19.