idiap / attention-sampling

This Python package enables the training and inference of deep learning models for very large data, such as megapixel images, using attention-sampling
Other
97 stars 18 forks source link

It's not learning #5

Closed andersbhc-mmmi closed 4 years ago

andersbhc-mmmi commented 4 years ago

Hi, The model seems to not learn at all - I am using all default hyperparameters when running mnist.py: abc@sdur-3:~/ats/scripts$ python3 ./mnist.py --epochs 10 ~/ats/datadir ~/ats/model_output

The loss doesn't really drop (or only very marginally) - does this look right to you?

I have already looked into perhaps the data was not loaded correctly, but it seems fine.

I have attached a snippet of the first 10 epochs, where you can see that it doesn't really learn.

Attention-Model-MNIST-test

angeloskath commented 4 years ago

Hi,

As you can see in the examples page in the docs, it takes several hundred epochs for the loss to start falling and then it falls rapidly.

This makes sense since the information regarding where to look is sparse in the large image and the randomly initialized network needs a lot of tries until it learns where to look. If you want to check whether your setup works, I would suggest trying the smaller noiseless example first (you can use the following).

$ mkdir /tmp/mnist-small
$ ./make_mnist.py --width 500 --height 500 --no_noise --scale 0.2 /tmp/mnist-small
Sparsifying dataset
Processing  5000 /   5000
Sparsifying dataset
Processing  1000 /   1000
$ mkdir /tmp/mnist-experiment
$ ./mnist.py /tmp/mnist-small /tmp/mnist-experiment

I get something like the following:

Using TensorFlow backend.
Loaded dataset with the following parameters
{
    "n_train": 5000,
    "n_test": 1000,
    "width": 500,
    "height": 500,
    "scale": 0.2,
    "noise": false,
    "seed": 0
}
__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
==================================================================================================
input_1 (InputLayer)            (None, 100, 100, 1)  0                                            
__________________________________________________________________________________________________
sequential_1 (Sequential)       (None, 100, 100)     737         input_1[0][0]                    
__________________________________________________________________________________________________
input_2 (InputLayer)            (None, 500, 500, 1)  0                                            
__________________________________________________________________________________________________
activity_regularizer_1 (Activit (None, 100, 100)     0           sequential_1[1][0]               
__________________________________________________________________________________________________
sample_patches_1 (SamplePatches [(None, 10, 50, 50,  0           input_1[0][0]                    
                                                                 input_2[0][0]                    
                                                                 activity_regularizer_1[0][0]     
__________________________________________________________________________________________________
total_reshape_1 (TotalReshape)  (None, 50, 50, 1)    0           sample_patches_1[0][0]           
__________________________________________________________________________________________________
sequential_2 (Sequential)       (None, 32)           29344       total_reshape_1[0][0]            
__________________________________________________________________________________________________
total_reshape_2 (TotalReshape)  (None, 10, 32)       0           sequential_2[1][0]               
__________________________________________________________________________________________________
expectation_1 (Expectation)     (None, 32)           0           total_reshape_2[0][0]            
                                                                 sample_patches_1[0][1]           
__________________________________________________________________________________________________
dense_1 (Dense)                 (None, 10)           330         expectation_1[0][0]              
==================================================================================================
Total params: 30,411
Trainable params: 30,411
Non-trainable params: 0
__________________________________________________________________________________________________
Epoch 1/500
2019-08-27 14:45:14.356132: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 FMA
2019-08-27 14:45:14.386989: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2200050000 Hz
2019-08-27 14:45:14.388718: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x560a1c8837b0 executing computations on platform Host. Devices:
2019-08-27 14:45:14.388759: I tensorflow/compiler/xla/service/service.cc:158]   StreamExecutor device (0): <undefined>, <undefined>
2019-08-27 14:45:14.780816: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:998] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-08-27 14:45:14.786436: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1433] Found device 0 with properties: 
name: GeForce GTX 1080 Ti major: 6 minor: 1 memoryClockRate(GHz): 1.582
pciBusID: 0000:00:06.0
totalMemory: 10.92GiB freeMemory: 10.76GiB
2019-08-27 14:45:14.787313: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 0
2019-08-27 14:45:14.795831: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-08-27 14:45:14.796223: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990]      0 
2019-08-27 14:45:14.796501: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 0:   N 
2019-08-27 14:45:14.797510: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 10470 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1080 Ti, pci bus id: 0000:00:06.0, compute capability: 6.1)
2019-08-27 14:45:14.804440: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x560a21822a60 executing computations on platform CUDA. Devices:
2019-08-27 14:45:14.804865: I tensorflow/compiler/xla/service/service.cc:158]   StreamExecutor device (0): GeForce GTX 1080 Ti, Compute Capability 6.1
2019-08-27 14:45:19.067063: I tensorflow/stream_executor/dso_loader.cc:152] successfully opened CUDA library libcublas.so.10.0 locally
40/40 [==============================] - 15s 385ms/step - loss: 2.3033 - acc: 0.1083 - categorical_crossentropy: 2.3042 - val_loss: 2.3037 - val_acc: 0.1000 - val_categorical_crossentropy: 2.3046
Epoch 2/500
40/40 [==============================] - 6s 159ms/step - loss: 2.2996 - acc: 0.1094 - categorical_crossentropy: 2.3006 - val_loss: 2.3026 - val_acc: 0.0990 - val_categorical_crossentropy: 2.3035
Epoch 3/500
40/40 [==============================] - 6s 159ms/step - loss: 2.2933 - acc: 0.1216 - categorical_crossentropy: 2.2943 - val_loss: 2.2623 - val_acc: 0.2010 - val_categorical_crossentropy: 2.2632
Epoch 4/500
40/40 [==============================] - 7s 163ms/step - loss: 2.1361 - acc: 0.3667 - categorical_crossentropy: 2.1366 - val_loss: 2.0545 - val_acc: 0.5280 - val_categorical_crossentropy: 2.0549
Epoch 5/500
40/40 [==============================] - 6s 160ms/step - loss: 2.0101 - acc: 0.5631 - categorical_crossentropy: 2.0106 - val_loss: 1.9686 - val_acc: 0.5810 - val_categorical_crossentropy: 1.9691
Epoch 6/500
40/40 [==============================] - 6s 161ms/step - loss: 1.9371 - acc: 0.6122 - categorical_crossentropy: 1.9376 - val_loss: 1.8969 - val_acc: 0.6470 - val_categorical_crossentropy: 1.8974
Epoch 7/500
40/40 [==============================] - 6s 160ms/step - loss: 1.8749 - acc: 0.6396 - categorical_crossentropy: 1.8754 - val_loss: 1.8420 - val_acc: 0.6540 - val_categorical_crossentropy: 1.8425
Epoch 8/500
40/40 [==============================] - 7s 164ms/step - loss: 1.8134 - acc: 0.6701 - categorical_crossentropy: 1.8138 - val_loss: 1.7879 - val_acc: 0.6790 - val_categorical_crossentropy: 1.7884
Epoch 9/500
40/40 [==============================] - 7s 163ms/step - loss: 1.7482 - acc: 0.7033 - categorical_crossentropy: 1.7486 - val_loss: 1.7264 - val_acc: 0.6940 - val_categorical_crossentropy: 1.7268
Epoch 10/500
40/40 [==============================] - 7s 163ms/step - loss: 1.6953 - acc: 0.7126 - categorical_crossentropy: 1.6957 - val_loss: 1.6768 - val_acc: 0.6780 - val_categorical_crossentropy: 1.6772
Epoch 11/500
40/40 [==============================] - 7s 174ms/step - loss: 1.6435 - acc: 0.7407 - categorical_crossentropy: 1.6439 - val_loss: 1.6347 - val_acc: 0.7080 - val_categorical_crossentropy: 1.6352
Epoch 12/500
40/40 [==============================] - 7s 167ms/step - loss: 1.5998 - acc: 0.7523 - categorical_crossentropy: 1.6003 - val_loss: 1.5706 - val_acc: 0.7400 - val_categorical_crossentropy: 1.5711
Epoch 13/500
40/40 [==============================] - 7s 163ms/step - loss: 1.5447 - acc: 0.7660 - categorical_crossentropy: 1.5452 - val_loss: 1.5233 - val_acc: 0.7620 - val_categorical_crossentropy: 1.5237
Epoch 14/500
40/40 [==============================] - 6s 161ms/step - loss: 1.5023 - acc: 0.7818 - categorical_crossentropy: 1.5028 - val_loss: 1.5185 - val_acc: 0.7230 - val_categorical_crossentropy: 1.5189
Epoch 15/500
40/40 [==============================] - 7s 166ms/step - loss: 1.4635 - acc: 0.7835 - categorical_crossentropy: 1.4639 - val_loss: 1.4640 - val_acc: 0.7530 - val_categorical_crossentropy: 1.4644
Epoch 16/500
40/40 [==============================] - 7s 165ms/step - loss: 1.4130 - acc: 0.8116 - categorical_crossentropy: 1.4135 - val_loss: 1.4291 - val_acc: 0.7840 - val_categorical_crossentropy: 1.4296
Epoch 17/500
40/40 [==============================] - 7s 164ms/step - loss: 1.3884 - acc: 0.7960 - categorical_crossentropy: 1.3889 - val_loss: 1.3845 - val_acc: 0.7740 - val_categorical_crossentropy: 1.3849
Epoch 18/500
40/40 [==============================] - 7s 169ms/step - loss: 1.3521 - acc: 0.8082 - categorical_crossentropy: 1.3526 - val_loss: 1.3458 - val_acc: 0.8090 - val_categorical_crossentropy: 1.3463
Epoch 19/500
40/40 [==============================] - 7s 170ms/step - loss: 1.3179 - acc: 0.8182 - categorical_crossentropy: 1.3184 - val_loss: 1.3153 - val_acc: 0.8010 - val_categorical_crossentropy: 1.3158
Epoch 20/500
40/40 [==============================] - 7s 165ms/step - loss: 1.2809 - acc: 0.8173 - categorical_crossentropy: 1.2813 - val_loss: 1.2936 - val_acc: 0.7880 - val_categorical_crossentropy: 1.2941
Epoch 21/500
40/40 [==============================] - 7s 166ms/step - loss: 1.2489 - acc: 0.8301 - categorical_crossentropy: 1.2494 - val_loss: 1.2543 - val_acc: 0.7930 - val_categorical_crossentropy: 1.2548
Epoch 22/500
40/40 [==============================] - 7s 166ms/step - loss: 1.2111 - acc: 0.8312 - categorical_crossentropy: 1.2116 - val_loss: 1.2424 - val_acc: 0.8090 - val_categorical_crossentropy: 1.2429
Epoch 23/500
40/40 [==============================] - 7s 170ms/step - loss: 1.1826 - acc: 0.8349 - categorical_crossentropy: 1.1831 - val_loss: 1.2010 - val_acc: 0.8220 - val_categorical_crossentropy: 1.2015
Epoch 24/500
40/40 [==============================] - 7s 167ms/step - loss: 1.1536 - acc: 0.8407 - categorical_crossentropy: 1.1541 - val_loss: 1.1762 - val_acc: 0.8070 - val_categorical_crossentropy: 1.1766
Epoch 25/500
40/40 [==============================] - 7s 165ms/step - loss: 1.1280 - acc: 0.8381 - categorical_crossentropy: 1.1285 - val_loss: 1.1718 - val_acc: 0.7960 - val_categorical_crossentropy: 1.1723
Epoch 26/500
40/40 [==============================] - 7s 168ms/step - loss: 1.1039 - acc: 0.8401 - categorical_crossentropy: 1.1044 - val_loss: 1.1427 - val_acc: 0.8060 - val_categorical_crossentropy: 1.1431
Epoch 27/500
40/40 [==============================] - 7s 173ms/step - loss: 1.0837 - acc: 0.8488 - categorical_crossentropy: 1.0842 - val_loss: 1.0893 - val_acc: 0.8230 - val_categorical_crossentropy: 1.0898
Epoch 28/500
40/40 [==============================] - 7s 173ms/step - loss: 1.0484 - acc: 0.8458 - categorical_crossentropy: 1.0489 - val_loss: 1.1000 - val_acc: 0.8110 - val_categorical_crossentropy: 1.1005
Epoch 29/500
40/40 [==============================] - 7s 175ms/step - loss: 1.0436 - acc: 0.8461 - categorical_crossentropy: 1.0441 - val_loss: 1.0536 - val_acc: 0.8420 - val_categorical_crossentropy: 1.0541
Epoch 30/500
40/40 [==============================] - 6s 161ms/step - loss: 1.0174 - acc: 0.8452 - categorical_crossentropy: 1.0179 - val_loss: 1.0349 - val_acc: 0.8320 - val_categorical_crossentropy: 1.0354
Epoch 31/500
40/40 [==============================] - 6s 162ms/step - loss: 0.9945 - acc: 0.8482 - categorical_crossentropy: 0.9950 - val_loss: 1.0240 - val_acc: 0.8150 - val_categorical_crossentropy: 1.0245
Epoch 32/500
40/40 [==============================] - 7s 163ms/step - loss: 0.9574 - acc: 0.8544 - categorical_crossentropy: 0.9578 - val_loss: 1.0168 - val_acc: 0.8220 - val_categorical_crossentropy: 1.0173
Epoch 33/500
40/40 [==============================] - 7s 171ms/step - loss: 0.9452 - acc: 0.8616 - categorical_crossentropy: 0.9457 - val_loss: 0.9744 - val_acc: 0.8420 - val_categorical_crossentropy: 0.9749
Epoch 34/500
40/40 [==============================] - 7s 166ms/step - loss: 0.9251 - acc: 0.8562 - categorical_crossentropy: 0.9256 - val_loss: 0.9690 - val_acc: 0.8250 - val_categorical_crossentropy: 0.9694

I will close the issue but feel free to reopen if needed.

Cheers, Angelos

andersbhc-mmmi commented 4 years ago

Hi, Thank you for your swift response and the example! It makes sense - I will try with the small, noiseless example.

Cheers, Anders