Convolutional Neural Networks for Visual Recognition
Stanford - Spring 2021-2024
Solutions for CS231n course assignments offered by Stanford University (Spring 2021-2024). Inline questions are explained in detail, the code is brief and commented (see examples below). From what I investigated, these should be the shortest code solutions (excluding open-ended challenges). In assignment 2, DenseNet is used in PyTorch notebook and ResNet in TensorFlow notebook.
Check out the solutions for CS224n. They contain more comprehensive explanations than others.
It is advised to run in Colab, however, you can also run locally. To do so, first, set up your environment - either through conda or venv. It is advised to install PyTorch in advance with GPU acceleration. Then, follow the steps:
pip install -r requirements.txt
.ipynb
files to:
%cd cs231n/datasets/
!bash get_datasets.sh
%cd ../../
%cd cs231n
!python setup.py build_ext --inplace
%cd ..
I've gathered all the requirements for all 3 assignments into one file requirements.txt so there is no need to additionally install the requirements specified under each assignment folder. If you plan to complete TensorFlow.ipynb, then you also need to additionally install Tensorflow.
Note: to use MPS acceleration via Apple M1, see the comment in #4.
It is possible that once in a while a dimension in the gradcheck will not match exactly. What could such a discrepancy be caused by? Is it a reason for concern? What is a simple example in one dimension where a gradient check could fail? How would change the margin affect of the frequency of this happening? Hint: the SVM loss function is not strictly speaking differentiable
1D
case, we can define it as follows ( $\hat{y}$ - score, $i$ - any class, $c$ - correct class, $\Delta$ - margin):
$$f(x)=\max(0, x),\ \text{ where } x=\hat{y}_i-\hat{y}_c+\Delta$$
Let's now see how our $\max$ function fits the definition of computing the gradient. It is the formula we use for computing the gradient numerically when, instead of implementing the limit approaching to $0$, we choose some arbitrary small $h$:
$$\frac{df(x)}{dx}=\lim_{h \to 0}\frac{\max(0,x+h)-\max(0,x)}{h}$$
Now we can talk about the possible mismatches between numeric and analytic gradient computation:
1. **Cause of mismatch**
* _Relative error_ - the discrepancy is caused due to arbitrary choice of small values of $h$ because by definition it should approach `0`. _Analytic_ computation produces an exact result (as precise as computation precision allows) while _numeric_ solution only approximates the result.
* _Kinks_ - $\max$ only has a subgradient because when both values in $\max$ are equal, its gradient is undefined, therefore, not smooth. Such parts, referred to as _kinks_, may cause _numeric_ gradient to produce different results from _analytic_ computation due to (again) arbitrary choice of $h$.
2. **Concerns**
* When comparing _analytic_ and _numeric_ methods, _kinks_ are more dangerous than small inaccuracies where the gradient is smooth. Small derivative inaccuracies still change the weight by approximately the same amount but _kinks_ may cause unintentional updates as seen in an example below. If the unintentional values would have a noticeable affect on parameter updates, it is a reason for concern.
3. **`1D` example of numeric gradient fail**
* Assume $x=-10^{-9}$. Then the _analytic_ computation of the derivative of $\max(0, x)$ would yield `0`. However, if we choose our $h=10^{-8}$, then the _numeric_ computation would yield `0.9`.
4. **Relation between margin and mismatch**
* Assuming all other parameters remain **unchanged**, increasing $\Delta$ will lower the frequency of _kinks_. This is because higher $\Delta$ will cause more $x$ to be positive, thus reducing the probability of kinks. In reality though, it would not have a big effect - if we increase the margin $\Delta$, the **SVM** will only learn to increase the (negative) gap between $\hat y_i - \hat y_c$ and `0` (when $i\ne c$). But that still means, if we add $\Delta$, there is the same chance for $x$ to result on the edge.