Stable loss for MNIST - Githubissues

Bralio123 commented 5 years ago

Hi and thanks for your work,

I was just trying to plug-in the IID loss function for the MNIST example but the loss seems to stabilize to -0.55 just after 1 epoch. If I remove the NaN check, instead of stabilizing the network it starts to output NaNs and fails to learn.

I did consider this in the last version of your paper: "Mutual information (3) expands to I(z, z') = H(z) - H(z | z')." And also, the largest value for H(z) is ln(C) and minimum value for H(z|z') is 0. If the network has randomly assigned weights, its fair to say that the first set of predictions from the first set of mini-batches will have equal likelihood for each class and that H(z) will in fact be ln(C). Which checks out! But the conditional cluster assignment entropy remains at -6 throughout the entire training process and maximizing IID doesn't trade-off between individual and conditional assignments.

Here is a link to the repo, its just something simple: https://github.com/Bralio123/iic_simple

Just looking for a bit of insight as to what I may be doing wrong. Perhaps I should add an additional clustering head but I'm not sure if this will help.

jizongFox commented 5 years ago

I played with your code. Apart from the nan issue, the data augmentation you applied is not sufficient. MNIST does not converge when not applying strong transformation such as Colorjetter etc. That is based on my personal experience. I am also curious about how the author says @xu-ji.

Bralio123 commented 5 years ago

Hi,

Thanks for playing with my code!

I had a double check with entropy and conditional entropy values but they seemed to not move or trade-off at all. I'll apply 'stronger' affine transformations along with color to see if this helps; I also reckon that making my fully connected layer shallower might help, but we'll see how this goes.

Thanks again for this!

Cheers,

Bralio

De: jizongFox notifications@github.com Enviado: 26 de junho de 2019 14:06:38 Para: xu-ji/IIC Cc: Brandon Hobley (CMP - Postgraduate Researcher); State change Assunto: Re: [xu-ji/IIC] Stable loss for MNIST (#14)

I played with your code. Apart from the nan issue, the data augmentation you applied is not sufficient. MNIST does not converge when not applying strong transformation such as Colorjetter etc. That is based on my personal experience. I am also curious about how the author says @xu-jihttps://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fxu-ji&data=02%7C01%7CB.Hobley%40uea.ac.uk%7Cd5aa63c1eb2c465dea3d08d6fa3720d0%7Cc65f8795ba3d43518a070865e5d8f090%7C0%7C0%7C636971512012844347&sdata=RBY0TJlBhkPLLXL7vhTtnUmK2Es6y2qoA3pCLK%2BPPXw%3D&reserved=0.

— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHubhttps://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fxu-ji%2FIIC%2Fissues%2F14%3Femail_source%3Dnotifications%26email_token%3DAMOSB5DVYYCEUFMUVFREEIDP4NSV5A5CNFSM4H3RHJ3KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODYTOR3I%23issuecomment-505866477&data=02%7C01%7CB.Hobley%40uea.ac.uk%7Cd5aa63c1eb2c465dea3d08d6fa3720d0%7Cc65f8795ba3d43518a070865e5d8f090%7C0%7C0%7C636971512012844347&sdata=zgmQzlK0GBauBr9K0PsYQFeyYMF9OvEjV8MnmFBovVE%3D&reserved=0, or mute the threadhttps://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FAMOSB5A33DXMIFS64IQA2E3P4NSV5ANCNFSM4H3RHJ3A&data=02%7C01%7CB.Hobley%40uea.ac.uk%7Cd5aa63c1eb2c465dea3d08d6fa3720d0%7Cc65f8795ba3d43518a070865e5d8f090%7C0%7C0%7C636971512012854336&sdata=MaMYtZsfTF3RlUzCjaBK8DhWUIpdQZdNVgNp9scj5Dc%3D&reserved=0.

xu-ji commented 5 years ago

@Bralio123

To begin with perhaps use the transforms we used. If you look at the options given in the example command (link) and the transforms generator function (link) you can see the exact sequence of transforms used for MNIST:

For the original image:

RandomChoice between: centre crop 20x20 or random crop 20x20.
Resize to 24x24.

For the augmented image:

RandomApply of RandomRotation(25), with p=0.5.
RandomChoice between: random crop 16x16, random crop 20x20, random crop 24x24.
Resize to 24x24.
ColorJitter(brightness=0.4, contrast=0.4, saturation=0.4, hue=0.125).

Test image:

Centre crop 20x20.
Resize to 24x24.

If your architecture expects non 24x24 input then replace all the "resize to 24x24"s with the desired size. Our architecture was a simple 4 conv layer with 3 maxpool layer (followed by a single fully connected linear layer) VGG style network. The single head version (including kernel sizes, padding sizes etc) is defined by this class.

Other things to try:

The learning rate. It was 0.0001 for our architecture, yours may not be 0.001. It is quite important.
Your eps is very high, it's supposed to be tiny. We used sys.float_info.epsilon

Bralio123 commented 5 years ago

@xu-ji

Thanks for the response and the suggestions.

What values of IID loss are to be expected for the MNIST example?

xu-ji commented 5 years ago

If you download the trained models you can see all the records and plots for all models. For MNIST (model 685) for head B (main head, output channels = 10) the loss goes from -1.22 to -2.20.

xu-ji / IIC

Stable loss for MNIST #14