Open kikyou123 opened 8 years ago
I think β1,t ← 1 − (1 − β1)λ^(t−1) becomes close 1 and make the momentum degenerate (β1=0.1 here).
This particular ADAM code was based on version 2 of the paper, which had β1,t ← 1 − (1 − β1)λ^(t−1). I notice that the current version of ADAM paper doesn't seem to have β1,t ← 1 − (1 − β1)λ^(t−1) but rather fixes β1,t=0.9.
but I think λ should be 1-1e-8 not be 1e-8, why the momentum should degenerate. and in lsun dataset I fould both loss will become very small, so the train will be failed, I don not know why.
hi,
I have just updated the repostiory, to merge everything into gran.py. That probably wasn't the problem, but feel free to pull the current one and give it a try.
Also, make sure the full path to the data is given correctly and ends with something like,
dataset = ',,/,,/,,/preprocessed_100/'
also could you let me know what epoch does it fail? (if its at 0 onwards, its likely a path problem I think) and did you print out the samples at every epoch to see whether the samples make sense?
could you check if it works on CIfar10? because it might be the preprocessing part that is causing the problem.
we also tried on LSUN "living room and kitchen" dataset and it works fine, we will upload the samples shortly. :)
Chris
On 20 May 2016 at 11:00, houruibing notifications@github.com wrote:
but I think λ should be 1-1e-8 not be 1e-8, why the momentum should degenerate. and in lsun dataset I fould both loss will become very small, so the train will be failed, I don not know why.
— You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub https://github.com/jiwoongim/GRAN/issues/1#issuecomment-220630084
it work fine in cifar10, in lsun epoch 0 it will failed.
hi :) can you show me what you get ?
On 20 May 2016 at 21:30, houruibing notifications@github.com wrote:
it work fine in cifar10, in lsun epoch 0 it will failed.
— You are receiving this because you commented. Reply to this email directly or view it on GitHub https://github.com/jiwoongim/GRAN/issues/1#issuecomment-220751285
it is epoch 0 ,and i also found in cifar10 when I set b1=0 it will work ,and when I use this update algorithm, it will failed.
`class Adam(Update):
def __init__(self, lr=0.001, b1=0.9, b2=0.999, e=1e-8, l=1-1e-8, *args, **kwargs):
Update.__init__(self, *args, **kwargs)
self.__dict__.update(locals())
def __call__(self, params, cost):
updates = []
grads = T.grad(cost, params)
#grads = clip_norms(grads, self.clipnorm)
t = theano.shared(floatX(1.))
b1_t = self.b1*self.l**(t-1)
for p, g in zip(params, grads):
g = self.regularizer.gradient_regularize(p, g)
m = theano.shared(p.get_value() * 0.)
v = theano.shared(p.get_value() * 0.)
m_t = b1_t*m + (1 - b1_t)*g
v_t = self.b2*v + (1 - self.b2)*g**2
m_c = m_t / (1-self.b1**t)
v_c = v_t / (1-self.b2**t)
p_t = p - (self.lr * m_c) / (T.sqrt(v_c) + self.e)
p_t = self.regularizer.weight_regularize(p_t)
updates.append((m, m_t))
updates.append((v, v_t))
updates.append((p, p_t) )
updates.append((t, t + 1.))
return updates`
I don't think it is optimizer's problem, because ours work fine.. I suspect that the reason might be due to hyper-parameter tuning. Our pre-processed version of LSUN churches, living room + kitchen works fine. As you said GRAN on cifar10 works well, so maybe it is not optimization method. If you strongly believe that the problem comes from optimizer, then you could also try with different optimization methods.
in optimize_gan.py : function ADAM param l=1e-8, I wonder if it is wrong, because b1_t will become close to 0 .