tiny_imagenet_dataset - wrong labels in validation set

statist-bhfz commented 1 year ago

Validation dataloader from the example tinyimagenet-alexnet always returns 1:

library(torch)
library(torchvision)

dir <- "./Downloads/tiny-imagenet"

device <- if(cuda_is_available()) "cuda" else "cpu"

to_device <- function(x, device) {
  x$to(device = device)
}

valid_ds <- tiny_imagenet_dataset(
  dir,
  download = TRUE,
  split = "val",
  transform = function(x) {
    x %>%
      transform_to_tensor() %>%
      to_device(device) %>%
      transform_resize(c(64,64))
  }
)

valid_dl <- dataloader(valid_ds, batch_size = 64, shuffle = FALSE, drop_last = TRUE)

dataloader_next(dataloader_make_iter(valid_dl))[[2]]

torch_tensor
 1
 1
 1
 1
 1
 1
 1
 1
 1
 1
 1
 1
 1
 1
 1
 1
 1
 1
 1
 1
 1
 1
 1
 1
 1
 1
 1
 1
 1
 1
... [the output was truncated (use n=-1 to disable)]
[ CPULongType{64} ]

> sessionInfo()
R version 4.2.3 (2023-03-15 ucrt)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19045)

Matrix products: default

locale:
[1] LC_COLLATE=Russian_Ukraine.utf8 
[2] LC_CTYPE=Russian_Ukraine.utf8   
[3] LC_MONETARY=Russian_Ukraine.utf8
[4] LC_NUMERIC=C                    
[5] LC_TIME=Russian_Ukraine.utf8    

attached base packages:
[1] stats     graphics  grDevices utils     datasets 
[6] methods   base     

other attached packages:
[1] reprex_2.0.2      torchvision_0.5.1 torch_0.10.0     

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.10     rstudioapi_0.14 knitr_1.42     
 [4] magrittr_2.0.3  bit_4.0.5       R6_2.5.1       
 [7] jpeg_0.1-10     rlang_1.1.0     fastmap_1.1.1  
[10] fansi_1.0.4     tools_4.2.3     xfun_0.38      
[13] coro_1.0.3      utf8_1.2.3      cli_3.6.1      
[16] clipr_0.8.0     withr_2.5.0     htmltools_0.5.5
[19] yaml_2.3.7      digest_0.6.31   bit64_4.0.5    
[22] tibble_3.2.1    lifecycle_1.0.3 processx_3.8.1 
[25] callr_3.7.3     vctrs_0.6.2     fs_1.6.1       
[28] ps_1.7.5        evaluate_0.20   glue_1.6.2     
[31] rmarkdown_2.21  compiler_4.2.3  pillar_1.9.0   
[34] pkgconfig_2.0.3

The same thing happens when iterating along test dataset, only train set contains correct labels.

skeydan commented 1 year ago

I think that's because the validation set is not shuffled (as per dataloader creation). When you look at the complete set, you should get all 200 labels:

> valid_dl <- dataloader(valid_ds, batch_size = 10000, shuffle = FALSE, drop_last = TRUE)
> labels <- dataloader_next(dataloader_make_iter(valid_dl))[[2]]
> unique(as.numeric(labels))
  [1]   1   2   3   4   5   6   7   8   9  10  11  12  13  14  15  16  17  18  19
 [20]  20  21  22  23  24  25  26  27  28  29  30  31  32  33  34  35  36  37  38
 [39]  39  40  41  42  43  44  45  46  47  48  49  50  51  52  53  54  55  56  57
 [58]  58  59  60  61  62  63  64  65  66  67  68  69  70  71  72  73  74  75  76
 [77]  77  78  79  80  81  82  83  84  85  86  87  88  89  90  91  92  93  94  95
 [96]  96  97  98  99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114
[115] 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133
[134] 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152
[153] 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171
[172] 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190
[191] 191 192 193 194 195 196 197 198 199 200

Does this work for you?

statist-bhfz commented 1 year ago

@skeydan yes, it works, my mistake was to create new iterator in all sequential calls of dataloader_next(). But I still can't understand why example https://torchvision.mlverse.org/articles/examples/tinyimagenet-alexnet.html doesn't work:

[epoch 1]: Loss = 5.299575, Acc= 0.025040                
[epoch 2]: Loss = 5.299323, Acc= 0.025040                
[epoch 3]: Loss = 5.299293, Acc= 0.025040                
[epoch 4]: Loss = 5.299235, Acc= 0.025040

With model_resnet18() it looks something better but also confusing:

[epoch 1]: Loss = 4.568783, Acc= 0.064203                
[epoch 2]: Loss = 3.636775, Acc= 0.065905                 
[epoch 3]: Loss = 3.179632, Acc= 0.079127                 
[epoch 4]: Loss = 2.820852, Acc= 0.080429                 
[epoch 5]: Loss = 2.472266, Acc= 0.087740                 
[epoch 6]: Loss = 2.094413, Acc= 0.088442                 
[epoch 7]: Loss = 1.662074, Acc= 0.084135                 
[epoch 8]: Loss = 1.191983, Acc= 0.078125                 
[epoch 9]: Loss = 0.746314, Acc= 0.083333                 
[epoch 10]: Loss = 0.440392, Acc= 0.082833 
[epoch 11]: Loss = 0.281548, Acc= 0.075020

statist-bhfz commented 1 year ago

pred <- [torch_topk](https://rdrr.io/pkg/torch/man/torch_topk.html)(pred, k = 5, dim = 2, TRUE, TRUE)[[2]]$add(1L)

should be

pred <- [torch_topk](https://rdrr.io/pkg/torch/man/torch_topk.html)(pred, k = 5, dim = 2, TRUE, TRUE)[[2]]

Now model predicts right labels, but alexnet with optim_adam still doesn't learn anything at least during first several epochs while resnet18 has top5-accuracy 0.37 after first epoch. optim_adagrad(model$parameters, lr = 0.005) works much better.

# alexnet with optim_adam(model$parameters)
[epoch 1]: Loss = 5.299603, Acc= 0.025040                 
[epoch 2]: Loss = 5.299326, Acc= 0.025040 

# alexnet with optim_adagrad(model$parameters, lr = 0.005)
[epoch 1]: Loss = 7.493342, Acc= 0.112881                 
[epoch 2]: Loss = 4.772662, Acc= 0.175381

skeydan commented 1 year ago

Hi @statist-bhfz,

I can confirm your improvements work better than the original setting!

[epoch 1]: Loss = 6.557040, Acc= 0.105569                                      
[epoch 2]: Loss = 4.818115, Acc= 0.162760                                      
[epoch 3]: Loss = 4.612732, Acc= 0.209936                                      
[epoch 4]: Loss = 4.457637, Acc= 0.241587                                      
[epoch 5]: Loss = 4.362794, Acc= 0.255909

Merging your PR, thanks for the contribution!

mlverse / torchvision

tiny_imagenet_dataset - wrong labels in validation set #87