kevinzakka / recurrent-visual-attention

A PyTorch Implementation of "Recurrent Models of Visual Attention"
MIT License
468 stars 123 forks source link

Performance is not good when using my dataset. #16

Open bemoregt opened 6 years ago

bemoregt commented 6 years ago

Hi, @kevinzakka

I entered my own data with MNIST Format(256x256, Gray Images, 5000 Images/class)

But Performance is not good.

What's wrong with me?


Epoch: 196/500 - LR: 0.000300 0.8s - loss: 1.055 - acc: 100.000: 100%|█████████| 217/217 [00:00<00:00, 267.85it/s] train loss: 0.834 - train acc: 62.212 - val loss: 1.192 - val acc: 54.167

Epoch: 197/500 - LR: 0.000300 0.8s - loss: -0.885 - acc: 100.000: 100%|████████| 217/217 [00:00<00:00, 273.63it/s] train loss: 0.568 - train acc: 60.369 - val loss: 0.844 - val acc: 54.167

Epoch: 198/500 - LR: 0.000300 0.8s - loss: 0.780 - acc: 100.000: 100%|█████████| 217/217 [00:00<00:00, 270.30it/s] train loss: 0.565 - train acc: 57.604 - val loss: 1.076 - val acc: 50.000

Epoch: 199/500 - LR: 0.000300 0.8s - loss: 3.553 - acc: 0.000: 100%|███████████| 217/217 [00:00<00:00, 271.82it/s] train loss: 0.678 - train acc: 58.525 - val loss: 0.533 - val acc: 58.333

Epoch: 200/500 - LR: 0.000300 0.8s - loss: 0.116 - acc: 100.000: 100%|█████████| 217/217 [00:00<00:00, 272.74it/s] train loss: 0.651 - train acc: 58.986 - val loss: 1.418 - val acc: 45.833

Epoch: 201/500 - LR: 0.000300 0.8s - loss: 5.108 - acc: 0.000: 100%|███████████| 217/217 [00:00<00:00, 275.17it/s] train loss: 0.779 - train acc: 63.594 - val loss: 0.921 - val acc: 62.500

Epoch: 202/500 - LR: 0.000300 0.8s - loss: 1.587 - acc: 0.000: 100%|███████████| 217/217 [00:00<00:00, 270.84it/s] train loss: 0.830 - train acc: 58.525 - val loss: 0.746 - val acc: 58.333 [!] No improvement in a while, stopping training.

Thanks.

from @bemoregt.

kevinzakka commented 6 years ago

@bemoregt your image size is way bigger than MNIST (256x256) so there are few hyperparameters you'd have to tweak to improve performance. For example, you can try increasing the patch size, the number of patches per glimpse and the number of glimpses taken per image. You could also try to increase the hidden size of the RNN, etc.

bemoregt commented 6 years ago

Hi, @kevinzakka

I'll try that. And Is there any other tweak options for my 256x256 data?

My images are 256x256xgrayscale, thin line defects in solarcell problems. and Image Augmented to 5000 images/class by retinex filtering/Unsharp masking/filp/flop, etc.

Thanks in advance.

from @bemortgt.

duygusar commented 6 years ago

@bemoregt Hi there, I was wondering if you have come across any tensor mismatch problems while training with your dataset? Can you please take a look into this issue? https://github.com/kevinzakka/recurrent-visual-attention/issues/19 I appreciate it, I can not trace why it fails.

bemoregt commented 6 years ago

@kevinzakka @duygusar

There is no error happen at my train time. But RAM's final accuracy is just 87.8% with my data.(Image Classification).

Otherwise, When my CNN+Supervised Learning, final accuracy is 99.8% with my same dataset.

What's wrong with me?

2018-06-21 9 05 24

2018-06-21 9 15 45

Thanks at any rate.

from @bemoregt.

bemoregt commented 6 years ago

@kevinzakka @duygusar

My Params here:

glimpse network params

glimpse_arg = add_argument_group('Glimpse Network Params') glimpse_arg.add_argument('--patch_size', type=int, default=64, help='size of extracted patch at highest res') glimpse_arg.add_argument('--glimpse_scale', type=int, default=2, help='scale of successive patches') glimpse_arg.add_argument('--num_patches', type=int, default=1, help='# of downscaled patches per glimpse') glimpse_arg.add_argument('--loc_hidden', type=int, default=128, help='hidden size of loc fc') glimpse_arg.add_argument('--glimpse_hidden', type=int, default=128, help='hidden size of glimpse fc')

core network params

core_arg = add_argument_group('Core Network Params') core_arg.add_argument('--num_glimpses', type=int, default=6, help='# of glimpses, i.e. BPTT iterations') core_arg.add_argument('--hidden_size', type=int, default=256, help='hidden size of rnn')

reinforce params

reinforce_arg = add_argument_group('Reinforce Params') reinforce_arg.add_argument('--std', type=float, default=0.17, help='gaussian policy standard deviation') reinforce_arg.add_argument('--M', type=float, default=10, help='Monte Carlo sampling for valid and test sets')

data params

data_arg = add_argument_group('Data Params') data_arg.add_argument('--valid_size', type=float, default=0.1, help='Proportion of training set used for validation') data_arg.add_argument('--batch_size', type=int, default=32, help='# of images in each batch of data') data_arg.add_argument('--num_workers', type=int, default=4, help='# of subprocesses to use for data loading') data_arg.add_argument('--shuffle', type=str2bool, default=True, help='Whether to shuffle the train and valid indices') data_arg.add_argument('--show_sample', type=str2bool, default=True, help='Whether to visualize a sample grid of the data')

training params

train_arg = add_argument_group('Training Params') train_arg.add_argument('--is_train', type=str2bool, default=True, help='Whether to train or test the model') train_arg.add_argument('--momentum', type=float, default=0.5, help='Nesterov momentum value') train_arg.add_argument('--epochs', type=int, default=500, help='# of epochs to train for') train_arg.add_argument('--init_lr', type=float, default=3e-4, help='Initial learning rate value') train_arg.add_argument('--lr_patience', type=int, default=10, help='Number of epochs to wait before reducing lr') train_arg.add_argument('--train_patience', type=int, default=90, help='Number of epochs to wait before stopping train')

other params

misc_arg = add_argument_group('Misc.') misc_arg.add_argument('--use_gpu', type=str2bool, default=False, help="Whether to run on the GPU") misc_arg.add_argument('--best', type=str2bool, default=True, help='Load best model or most recent for testing') misc_arg.add_argument('--random_seed', type=int, default=22, help='Seed to ensure reproducibility') misc_arg.add_argument('--data_dir', type=str, default='./data', help='Directory in which data is stored') misc_arg.add_argument('--ckpt_dir', type=str, default='./ckpt', help='Directory in which to save model checkpoints') misc_arg.add_argument('--logs_dir', type=str, default='./logs/', help='Directory in which Tensorboard logs wil be stored') misc_arg.add_argument('--use_tensorboard', type=str2bool, default=False, help='Whether to use tensorboard for visualization') misc_arg.add_argument('--resume', type=str2bool, default=False, help='Whether to resume training from checkpoint') misc_arg.add_argument('--print_freq', type=int, default=10, help='How frequently to print training details') misc_arg.add_argument('--plot_freq', type=int, default=1, help='How frequently to plot glimpses')

duygusar commented 6 years ago

@bemoregt Thank you for sharing your parameters, when/if I can make it work I will report on the performance. One thing I can think of is the implementation has room for improvement e.g. not start with center patches but do it randomly (if I am not wrong this implementation uses center point to initialize).

I think @kevinzakka might have a better answer but the performance could also be related to many things. Do you think your CNN classification might be overfitting? Attention model might be sort of doinga similar effect to augmenting your data. Or perhaps it is the nature of your data. Attention models seem to work better only in particular cases (for example it performs better on a noise added MNIST set than MNIST)

duygusar commented 6 years ago

On the other hand I think my problem might be that my images are 427x240 RGB images. I have come across some posts on pytorch having problem with certain image sizes and they are advised to tackle it by using adaptive avg pooling layer rather than avg pooling.. I will try that. Or maybe it has to do with center initialization or padding of tensors. Still no idea. @kevinzakka

duygusar commented 6 years ago

@bemoregt It seems the implementation is not yet a full implementation of the paper. Just fyi as that might be the issue with performance. It never occurred to me to check the closed issues- I did go through the code but not in great detail as the readme gave the impression that it was a full paper implementation as it was also featured on torch blogs - and I am new to pytorch.

https://github.com/kevinzakka/recurrent-visual-attention/issues/13 https://github.com/kevinzakka/recurrent-visual-attention/issues/1

linzhiqiu commented 6 years ago

I am also thinking of using this model for my own dataset, but from your discussion it looks like: 1/ This implementation is not complete yet, 2/ This RAM only works well on certain datasets, so its power is actually quite limited. Am I right?

duygusar commented 6 years ago

@linzhiqiu 1. Ok, my comment was not a caution against using this repository, I was just guessing why people have performance issues, and it seems I am not the only one. So yes, I think it would be better if everything is more clear. The repo is a functional minimal example of the model working with MNIST minus some aspects (like random search), it also does not yet support batches etc. although it is written in a way to (so I suppose it is left undone) So imho, expanding it to my dataset and having a close to the paper implementation is not as straightforward as I thought it was through the blog and readme here. Alas, this is a good opportunity for people to contribute to the project, but personally I am new to pytorch.

  1. I don't want to generalize but I would say yes, it depends on the nature of your data. You would be better off using a simple CNN with MNIST. But if you are working with the cluttered, noise added, translation varied version of MNIST, attention model performs better. It also depends on the problem because with image to text problems, it might be a better concept to use than just for classification.
kevinzakka commented 6 years ago

@duygusar @linzhiqiu the way I wrote the repository makes it so that you just have to modify the data_loader.py file to make it work for your own needs. The implementation does support batching; in fact, the reported accuracies were achieved using a batch size of 32.

The only current problem with this implementation is that the retina module is not efficient and the CPU version currently runs faster than the GPU. I had in mind to write a custom kernel using the new cpp extension but never got the time.

duygusar commented 6 years ago

@kevinzakka I did write my own dataloader, I have mentioned it here https://github.com/kevinzakka/recurrent-visual-attention/issues/19 And I was only referring to your comment here about batches https://github.com/kevinzakka/recurrent-visual-attention/issues/1 Since the suggested consensus way of handling different size of images for pytorch didn't solve my problem, and considering your comment I assumed it could be the batches or the color channel.

chatgptcoderhere commented 3 years ago

Hi, @kevinzakka

I entered my own data with MNIST Format(256x256, Gray Images, 5000 Images/class)

But Performance is not good.

What's wrong with me?

Epoch: 196/500 - LR: 0.000300 0.8s - loss: 1.055 - acc: 100.000: 100%|█████████| 217/217 [00:00<00:00, 267.85it/s] train loss: 0.834 - train acc: 62.212 - val loss: 1.192 - val acc: 54.167

Epoch: 197/500 - LR: 0.000300 0.8s - loss: -0.885 - acc: 100.000: 100%|████████| 217/217 [00:00<00:00, 273.63it/s] train loss: 0.568 - train acc: 60.369 - val loss: 0.844 - val acc: 54.167

Epoch: 198/500 - LR: 0.000300 0.8s - loss: 0.780 - acc: 100.000: 100%|█████████| 217/217 [00:00<00:00, 270.30it/s] train loss: 0.565 - train acc: 57.604 - val loss: 1.076 - val acc: 50.000

Epoch: 199/500 - LR: 0.000300 0.8s - loss: 3.553 - acc: 0.000: 100%|███████████| 217/217 [00:00<00:00, 271.82it/s] train loss: 0.678 - train acc: 58.525 - val loss: 0.533 - val acc: 58.333

Epoch: 200/500 - LR: 0.000300 0.8s - loss: 0.116 - acc: 100.000: 100%|█████████| 217/217 [00:00<00:00, 272.74it/s] train loss: 0.651 - train acc: 58.986 - val loss: 1.418 - val acc: 45.833

Epoch: 201/500 - LR: 0.000300 0.8s - loss: 5.108 - acc: 0.000: 100%|███████████| 217/217 [00:00<00:00, 275.17it/s] train loss: 0.779 - train acc: 63.594 - val loss: 0.921 - val acc: 62.500

Epoch: 202/500 - LR: 0.000300 0.8s - loss: 1.587 - acc: 0.000: 100%|███████████| 217/217 [00:00<00:00, 270.84it/s] train loss: 0.830 - train acc: 58.525 - val loss: 0.746 - val acc: 58.333 [!] No improvement in a while, stopping training.

Thanks.

from @bemoregt.

WHat changes did you do to make this repo work for your custom data, I am having troubles converting my data to mnist format. Is there any easier way to use this for own data?