ilmoi / MML-Book

Code / solutions for Mathematics for Machine Learning (MML Book)
957 stars 162 forks source link

Chapter 5, Ex. 7, part b #10

Closed AlbertoGuastalla closed 3 years ago

AlbertoGuastalla commented 3 years ago

The derivative df/dx is not an E x D matrix but a (D x 1)-dimensional vector. z = Ax + b and therefore Ax is a (E x D) (D x 1) product -> E x 1 (also after adding up vector b and take the cos() of each component). We transpose it in a (1 x E)-dimensional vector indeed (this is the gradient of f respect to z vector). Later, we multiply it with matrix A (dz/dx gradient) -> (1 x E) (E x D) -> 1 x D and take the transpose -> D x 1. Now the dimension of the gradient df/dx mathes with the dimension of the input vector x (D x 1).