Theory: Neural Networks

Opsimathy commented 6 months ago

Over the weekend, I have been able to study some basics of neural networks, mainly their underlying theories.

Basically, (artificial) neural network (NN) is a model that functions in a similar way to our human brains. They are designed to recognize patterns and solve problems through a network of interconnected nodes, known as neurons. Mathematically, a neural network is represented as a function $f:\mathbb{R}^n\rightarrow\mathbb{R}^m$, where $n$ is the dimension of the input vector $\mathbf{x}$ and $m$ is the dimension of the output vector $\mathbf{y}$.

A neural network typically consists of an input layer, multiple hidden layers, and an output layer. Each layer $l$ contains $N_l$ neurons, and the output of each neuron in a layer is computed as $\mathbf{z}^l = \mathbf{W}^l \mathbf{a}^{l-1} + \mathbf{b}^l$, where $\mathbf{z}^l$ is the vector of linear combinations for layer $l$, $\mathbf{W}^l$ is the weight matrix connecting layer $l-1$ to layer $l$, $\mathbf{a}^{l-1}$ is the activation vector from the previous layer, and $\mathbf{b}^l$ is the bias vector for layer $l$.

The activation vector is then computed by applying a nonlinear activation function $\sigma$ element-wise, i.e., $\mathbf{a}^l = \sigma(\mathbf{z}^l)$. Common activation functions include the sigmoid function $\sigma(x) = 1/(1 + e^{-x})$, the hyperbolic tangent function, and as in the lab, the rectified linear unit (relu) function $\sigma(x) = \max(0, x)$.

The goal of training a neural network is to optimize the weight matrices $\mathbf{W}$ and bias vectors $\mathbf{b}$ to minimize a loss function $\mathcal{L}(\mathbf{y}, \hat{\mathbf{y}})$, where $\mathbf{y}$ is the true output and $\hat{\mathbf{y}}$ is the predicted output. Common loss functions include the mean squared error for regression tasks and the cross-entropy loss for classification tasks.

The optimization process is typically performed using gradient descent algorithms (and they would have been already introduced by James).

In the next week, I am going to write some original code based on the labs provided, with some specific examples.

Opsimathy commented 6 months ago

Possible examples:

Regression (that would be a natural extension from the Numerical Analysis course)
Number recognition based on MNIST (also mentioned in lab 6 using neural ODEs)

dlfivefifty commented 6 months ago

Sounds good but:

in a similar way to our human brains

I don’t think this is true, or at least no one would claim there’s a relationship between how NN work and our brain

Opsimathy commented 6 months ago

Sounds good but:

in a similar way to our human brains

I don’t think this is true, or at least no one would claim there’s a relationship between how NN work and our brain

Oh, maybe I was kind of misled by the first paragraph of the Wikipedia page, which says "In machine learning, a neural network ... is a model inspired by the structure and function of biological neural networks in animal brains."

Also, I would like to ask if we can use code from the SciMLSANUM2024 labs (probably with some modification) provided we cite them properly.

dlfivefifty commented 6 months ago

I think there's a subtle difference between "similar to" and "inspired by".

Also, I would like to ask if we can use code from the SciMLSANUM2024 labs (probably with some modification) provided we cite them properly.

Yes that's fine. Note the rules of using open-source code will be specified by the license:

https://github.com/dlfivefifty/SciMLSANUM2024/blob/main/LICENSE

In this case I used MIT license, the same license as most Julia code, which is pretty relaxed: as long as you give credit (i.e. via citation) it should be fine. If you find code that is GPL you'll need to be a bit more careful about how you use it.

jaamestaay / M2R-Group-29

Theory: Neural Networks #4