mml-book / mml-book.github.io

Companion webpage to the book "Mathematics For Machine Learning"
13.11k stars 2.42k forks source link

Why the gradient is not defined as the transpose of the Jacobian Matrix, as indicated in "Matrix Differential Caculus"? #753

Closed hopezh closed 1 year ago

hopezh commented 1 year ago

In the side note on pp.150, it's indicated that "The gradient of a function f: Rn -> Rm is a matrix of size m x n", i.e. the Jacobian matrix.

However, it's indicated in "Matrix Differential Calculus" by Jan R. Magnus and Heinz Neudecker, that "the transpose of the m x n Jacobian Matrix, i.e. an n x m matrix, is called the gradient...".

So, which one is correct?

  1. Is the Jacobian matrix the gradient?
  2. or is the transpose of the Jacobian matrix the gradient?

Pp.150 in Mathematics for Machine Learning: Screenshot 2023-06-25 at 1 01 57 AM

Pp.97 of Matrix Differential Calculus: Screenshot 2023-06-25 at 1 02 20 AM

mpd37 commented 1 year ago

The Jacobian as the matrix formed by the partial derivatives of a vector-valued function. Each row is the gradient of a scalar-valued function. We use "Jacobian" and "gradient" interchangeably for vector-valued functions; and the dimensions work out nicely with the chain rule.