Just a minor comment extending the reason of using grad = [2 * error * x, 2 * error] in function linear_gradient ("Chapter 8. Gradient Descent" section "Using Gradient Descent to Fit Models"). Find the note included below in the python comments. I had to spend a bit of time to find out why the gradient was calculated like that, so I hope anyone finds it useful.
def linear_gradient(x: float, y: float, theta: Vector) -> Vector:
slope, intercept = theta
predicted = slope * x + intercept # The prediction of the model.
error = (predicted - y) # error is (predicted - actual)
squared_error = error ** 2 # We'll minimize squared error (e_sq), which depends on the current guess values for slope (m) and intercept (n).
# e_sq(m,n) = error^2 (y_predicted - y_actual)^2 = (m*x + n - y_actual)^2
grad = [2 * error * x, 2 * error] # using its gradient, whose partial derivatives are d(e_sq)/dm = 2*error*m and d(e_sq)/dn = 2*error ( applying derivative rule for exponential: (f^2)' = 2*f*f' )
return grad
Just a minor comment extending the reason of using
grad = [2 * error * x, 2 * error]
in functionlinear_gradient
("Chapter 8. Gradient Descent" section "Using Gradient Descent to Fit Models"). Find the note included below in the python comments. I had to spend a bit of time to find out why the gradient was calculated like that, so I hope anyone finds it useful.