jarrelscy / iResnet

Non official pytorch implementation of i-Resnet, invertible residual networks.
MIT License
25 stars 8 forks source link

Is the CNN version code trained well? #2

Open shimazing opened 5 years ago

shimazing commented 5 years ago

Your code is really helpful to understand how iResNet works. Thanks for writing this code. However, when I was trying to run the CNN version code jupyter notebook, It gave me the wrong result on the evaluation phase (when activating evaluation mode with net.eval()) such that after a few iterations, the model even cannot reconstruct the inputs and the latent standard of test data diverges. (I am using DataParallel and Do u think the problem comes from this?)

Did you get the right result??

Thanks in advance for your reply

jarrelscy commented 5 years ago

Hi

Thanks for kind comments. The model I have trains correctly but slowly. Did you alter any of the parameters including the batch size?

Also the quality of the generated images are poor without training for long (200 epochs in the original paper) even though visual inspection of some of the dimensions seem correct.

Jarrel

On Fri., 17 May 2019, 12:38 shimazing, notifications@github.com wrote:

Your code is really helpful to understand how iResNet works. Thanks for writing this code. However, when I was trying to run the CNN version code jupyter notebook, It gave me the wrong result such that after a few iterations, the model even can not reconstruct the inputs and latent standard of test data diverges.

Did you get the right result??

Thanks in advance for your reply

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/jarrelscy/iResnet/issues/2?email_source=notifications&email_token=AAOEUBYX3N3U5JHJYJNY6ILPVYLCLA5CNFSM4HNRS6S2YY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4GUJZWRQ, or mute the thread https://github.com/notifications/unsubscribe-auth/AAOEUB2HAJZI33V635WE6LTPVYLCLANCNFSM4HNRS6SQ .

shimazing commented 5 years ago

My conjecture is that the optimization step makes spectral norm larger than 1 and your code uses sigma calculated in the training phase to normalize it. It changes weight in a test phase. I think this is not a correct action.

And one more question is that does this code really do an in-place update for u? What do u mean my "v" in comment under def compute_weight(self, module, do_power_iteration, num_iter=0): in SpectralNormGouk.py file

Hajin Shim

jarrelscy commented 5 years ago

Actually you are on the right path. I have just checked the code and it uses an older incorrect version of the Spectral Normalization by Gouk. Specifically it underestimates the largest singular value because I use too small an x_i

https://arxiv.org/pdf/1804.04368.pdf for my future reference.

I actually made this change a while back but did not upload the corrected file, so my apologies.

p.s. you can ignore the comment under compute_weight this was copied from an earlier implementation of Miyato's spectral norm using u and v vectors

As to whether using sigma calculated in the training phase is valid in the testing phase, that is a good point. In theory the weight shouldn't be changing during the testing phase (since no weight update is performed) and sigma is solely dependent on the weight, so that shouldn't also change.

In practice the sigma is somewhat variable, as the power iteration method only gives a bounded estimate, so I'm unclear whether recalculating sigma during the testing phase will change the result.

Try the updated version and see if this works first.

shimazing commented 5 years ago

Thanks for updating!! :)

However, I still have a problem and have a question. what is "weight_orig" parameter for? With assertion check, I've noticed that weight and weight_orig have different value but we use weight_orig to calculate sigma. When is this weight_orig updated to reflect the current state? I wonder this is the right way.

Thanks again for your fast reply :)

jarrelscy commented 5 years ago

Weight_orig is the original weight and the actual parameter that is undergoes gradient descent The spectral norm replaces the weight parameter with a torch tensor which is recomputed everytime gradient descent happens.

This is the same approach used in the pytorch implementation of Miyato's spectral_norm (in fact it is shamelessly copied including comments...)

https://pytorch.org/docs/stable/_modules/torch/nn/utils/spectral_norm.html

So when the Conv2d runs, it requests module.weight which is the recomputed tensor.

When gradient descent runs and weight_orig is altered, weight is recomputed by finding the sigma of weight_orig and dividing it by the sigma if it is larger than 1.

lingzenan commented 5 years ago

My conjecture is that the optimization step makes spectral norm larger than 1 and your code uses sigma calculated in the training phase to normalize it. It changes weight in a test phase. I think this is not a correct action.

And one more question is that does this code really do an in-place update for u? What do u mean my "v" in comment under def compute_weight(self, module, do_power_iteration, num_iter=0): in SpectralNormGouk.py file

Hajin Shim

Actually you are on the right path. I have just checked the code and it uses an older incorrect version of the Spectral Normalization by Gouk. Specifically it underestimates the largest singular value because I use too small an x_i

https://arxiv.org/pdf/1804.04368.pdf for my future reference.

I actually made this change a while back but did not upload the corrected file, so my apologies.

p.s. you can ignore the comment under compute_weight this was copied from an earlier implementation of Miyato's spectral norm using u and v vectors

As to whether using sigma calculated in the training phase is valid in the testing phase, that is a good point. In theory the weight shouldn't be changing during the testing phase (since no weight update is performed) and sigma is solely dependent on the weight, so that shouldn't also change.

In practice the sigma is somewhat variable, as the power iteration method only gives a bounded estimate, so I'm unclear whether recalculating sigma during the testing phase will change the result.

Try the updated version and see if this works first.

I met the similar problem. I am writing the classification code based on the "SpectralNormGouk.py" file. However, the test loss increased and the test accuracy decreased to around 10% while the training loss and accuracy performed well. Besides, when I checked the trained model , i.e., load the state dict, the loss differed a lot from the values printed during the training.

shimazing commented 5 years ago

@lingzenan Do you run the code with DataParallel??

shimazing commented 5 years ago

@jarrelscy I still have a problem even with the updated version. Have you run the code with DataParallel?

lingzenan commented 5 years ago

@shimazing yes

jarrelscy commented 5 years ago

I did not use data parallel. I'm not sure how the actnorm would behave with data parallel, you may have to run a test batch before copying the model to other gpus or the batch statistics may be wrong.

On Fri., 17 May 2019, 20:55 Zenan Ling, notifications@github.com wrote:

@shimazing https://github.com/shimazing yes

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/jarrelscy/iResnet/issues/2?email_source=notifications&email_token=AAOEUB4IXKRIXAIGA3FQ5V3PV2FJRA5CNFSM4HNRS6S2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODVUODIY#issuecomment-493412771, or mute the thread https://github.com/notifications/unsubscribe-auth/AAOEUB5QU7JLJFJ2BH7AJ6TPV2FJRANCNFSM4HNRS6SQ .

lingzenan commented 5 years ago

@jarrelscy Problems still exit without data parallel. Here is a toy example.

import torch.nn as nn from SpectralNormGouk1 import from torch.optim import

class toy(nn.Module): def init(self): super(toy, self).init() self.f = spectral_norm(nn.Linear(10, 10, bias=False), magnitude=0.9, n_power_iterations=5)

def forward(self, x):
    x = self.f(x)
    return x

if name == "main":

net = toy()

opt = Adam(net.parameters(), lr=0.01)
criterion = nn.MSELoss()
for i in range(1000):
    net.train()
    opt.zero_grad()
    inputs = torch.ones(32, 10)
    y = net(inputs)
    loss = criterion(y, inputs)
    loss.backward()
    opt.step()
    print(loss.item())
torch.save(net.state_dict(), 'check.pkl')

print("########eval###########")
net.eval()
with torch.no_grad():
    inputs = torch.ones(32, 10)
    y = net(inputs)
    loss = criterion(y, inputs)
    print(loss.item())
print("########eval_check###########")
net_ = toy()
state = torch.load('check.pkl')
net_.load_state_dict(state)
net_.eval()
with torch.no_grad():
    inputs = torch.ones(32, 10)
    y = net_(inputs)
    loss = criterion(y, inputs)
    print(loss.item())

"AttributeError: 'Linear' object has no attribute 'sigma' "

jarrelscy commented 5 years ago

Hi Zenan,

Saving and loading is not implemented yet.

Jarrel

On Mon., 20 May 2019, 07:15 Zenan Ling, notifications@github.com wrote:

@jarrelscy https://github.com/jarrelscy Problems still exit without data parallel. Here is a toy example.

import torch.nn as nn from SpectralNormGouk1 import from torch.optim import

class toy(nn.Module): def init(self): super(toy, self).init() self.f = spectral_norm(nn.Linear(10, 10, bias=False), magnitude=0.9, n_power_iterations=5)

def forward(self, x): x = self.f(x) return x

if name == "main":

net = toy()

opt = Adam(net.parameters(), lr=0.01) criterion = nn.MSELoss() for i in range(1000): net.train() opt.zero_grad() inputs = torch.ones(32, 10) y = net(inputs) loss = criterion(y, inputs) loss.backward() opt.step() print(loss.item()) torch.save(net.state_dict(), 'check.pkl')

print("########eval###########") net.eval() with torch.no_grad(): inputs = torch.ones(32, 10) y = net(inputs) loss = criterion(y, inputs) print(loss.item()) print("########evalcheck###########") net = toy() state = torch.load('check.pkl') net_.load_statedict(state) net.eval() with torch.nograd(): inputs = torch.ones(32, 10) y = net(inputs) loss = criterion(y, inputs) print(loss.item())

"AttributeError: 'Linear' object has no attribute 'sigma' "

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/jarrelscy/iResnet/issues/2?email_source=notifications&email_token=AAOEUB3VEDXK4CHAIZGOEZ3PWKB3PA5CNFSM4HNRS6S2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODVYPVFI#issuecomment-493943445, or mute the thread https://github.com/notifications/unsubscribe-auth/AAOEUB4W4UL5TORI4R4VSGDPWKB3PANCNFSM4HNRS6SQ .

lingzenan commented 5 years ago

@jarrelscy Thanks for your reply.

lingzenan commented 5 years ago

@jarrelscy The test loss and accuracy seem to be normal if I use "net.train()" and "with with torch.no_grad()" during the test phase.

jarrelscy commented 5 years ago

Interesting maybe we should be recalculating sigma during test time then

On Tue., 21 May 2019, 03:32 Zenan Ling, notifications@github.com wrote:

@jarrelscy https://github.com/jarrelscy The test loss and accuracy seem to be normal if I use "net.train()" and "with with torch.no_grad()" during the test phase.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/jarrelscy/iResnet/issues/2?email_source=notifications&email_token=AAOEUB33HBT6FTKD5QQ4MELPWOQQBA5CNFSM4HNRS6S2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODV3APMI#issuecomment-494274481, or mute the thread https://github.com/notifications/unsubscribe-auth/AAOEUB6JIUDBFWZFKVDVZCTPWOQQBANCNFSM4HNRS6SQ .

shimazing commented 5 years ago

I also found it runs correctly with a single gpu and net.train() under torch.no_grad().

lingzenan commented 5 years ago

@shimazing @jarrelscy did you train the classification model?The author release the code in the latest version paper but the link is 404 now.

lingzenan commented 5 years ago

my classification net doesn’t work on single gpu the loss explodes

jarrelscy commented 5 years ago

I have yet to train the classification model.

On Thu., 30 May 2019, 06:21 Zenan Ling, notifications@github.com wrote:

my classification net doesn’t work on single gpu the loss explodes

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/jarrelscy/iResnet/issues/2?email_source=notifications&email_token=AAOEUB5KOD3HLNLFCRBPYFLPX5I5BA5CNFSM4HNRS6S2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODWRJ6IQ#issuecomment-497196834, or mute the thread https://github.com/notifications/unsubscribe-auth/AAOEUB7TIZ65XRL4YWJHI4DPX5I5BANCNFSM4HNRS6SQ .