hoangminhquan-lhsdt / optimizers

Implementation of various optimizers running on test functions for optimization
2 stars 1 forks source link

Optimizers

Implementations of various optimization algorithms using Python and numerical libraries.

This repository serves as the source for visualizations and evaluations used in our thesis.

Weekly Plan

https://docs.google.com/spreadsheets/d/1nzCk1bDLOWbMFBg6z3pWSK3CfT5G9WQMWmNeuJiTfqU/edit?usp=sharing

Tasks list

Implementations

View the full source code for each algorithm in optimizers.py

1. Stochastic Gradient Descent with Momentum

def step(self, x, y):
    g_t = self.func.df(x, y)
    self.v = self.momentum*self.v + self.lr*g_t
    return (x - self.v[0], y - self.v[1])

SGD with Momentum

2. AdaGrad

def step(self, x, y):
    g = self.func.df(x, y)
    self.sq_grad += g**2
    v = self.lr * g / (np.sqrt(self.sq_grad)+ self.eps)
    return x - v[0], y - v[1]

AdaGrad

3. AdaDelta

def step(self, x, y):
    gt = self.F.df(x, y)
    self.E_gt = self.lr*self.E_gt + (1 - self.lr)*(gt**2)
    RMS_gt = np.sqrt(self.E_gt + self.eps)
    RMS_delta = np.sqrt(self.E_delta + self.eps)
    delta = -(RMS_delta / RMS_gt)*gt
    self.E_delta = self.lr*self.E_delta + (1 - self.lr)*(delta**2)
    return (x + delta[0], y + delta[1])

AdaDelta

4. RMSprop

def step(self, x, y):
    g = self.F.df(x, y)
    print(g, self.E_g2)
    self.E_g2 = self.gamma*self.E_g2 + (1 - self.gamma)*(g**2)
    delta = self.lr * g / (np.sqrt(self.E_g2) + self.eps)
    return (x - delta[0], y - delta[1])

RMSprop

5. Adam

def step(self, x, y):
    self.t += 1
    g_t = self.F.df(x, y)
    self.m = self.b1*self.m + (1-self.b1)*g_t
    self.v = self.b2*self.v + (1-self.b2)*g_t*g_t
    m_hat = self.m / (1 - self.b1**self.t)
    v_hat = self.v / (1 - self.b2**self.t)
    return (x - (self.a*m_hat[0] / (np.sqrt(v_hat[0]) + self.eps)), y - (self.a*m_hat[1] / (np.sqrt(v_hat[1]) + self.eps)))

Adam

6. Nadam

def step(self,x,y):
    self.t += 1
    g_t = self.F.df(x,y)

    self.m = self.b1 * self.m + (1-self.b1) * g_t
    self.v = self.b2 * self.v + (1-self.b2) * g_t * g_t

    m_hat = self.m / (1 - self.b1 ** (self.t + 1))
    v_hat = self.b2 * self.v / (1 - self.b2 ** self.t)

    nes_m = (self.b1 * self.m) + ((1 - self.b1) * g_t / (1 - self.b1 ** self.t))
    step_size = self.lr * nes_m / (np.sqrt(v_hat) + self.eps)
    # print(step_size)
    return (x - step_size[0], y - step_size[1])

Nadam

7. AMSGrad

def step(self,x,y):
    self.t += 1
    g_t = self.F.df(x,y)

    # self.b1 = self.b1 / self.t #b1 decay [OPTIONAL]

    self.m = self.b1 * self.m + (1-self.b1)*g_t
    self.v = self.b1 * self.v + (1-self.b2)*g_t*g_t

    self.v_max = np.array((max(self.v[0],self.v_max[0]),max(self.v[1],self.v_max[1])))

    # alpha = lr/ np.sqrt(self.t) # learning rate decay [OPTIONAL]
    step_size = self.lr * self.m / (np.sqrt(self.v_max) + self.eps)

    return (x - step_size[0], y - step_size[1])

AMSGrad

Limitations

All algorithms are currently implemented in ℝ3 space mainly for visualization purpose. ℝN space implementation may be updated in the future.

Authors

This repository is maintained and developed by Hoàng Minh Quân and Nguyễn Ngọc Lan Như, students at University of Science, VNU-HCM.