Chapter 5, Ex. 7, part b

The derivative df/dx is not an E x D matrix but a (D x 1)-dimensional vector. z = Ax + b and therefore Ax is a (E x D) (D x 1) product -> E x 1 (also after adding up vector b and take the cos() of each component). We transpose it in a (1 x E)-dimensional vector indeed (this is the gradient of f respect to z vector). Later, we multiply it with matrix A (dz/dx gradient) -> (1 x E) (E x D) -> 1 x D and take the transpose -> D x 1. Now the dimension of the gradient df/dx mathes with the dimension of the input vector x (D x 1).

ilmoi / MML-Book

Chapter 5, Ex. 7, part b #10