Closed statist-bhfz closed 1 year ago
I think that's because the validation set is not shuffled (as per dataloader creation). When you look at the complete set, you should get all 200 labels:
> valid_dl <- dataloader(valid_ds, batch_size = 10000, shuffle = FALSE, drop_last = TRUE)
> labels <- dataloader_next(dataloader_make_iter(valid_dl))[[2]]
> unique(as.numeric(labels))
[1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
[20] 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38
[39] 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57
[58] 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76
[77] 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95
[96] 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114
[115] 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133
[134] 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152
[153] 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171
[172] 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190
[191] 191 192 193 194 195 196 197 198 199 200
Does this work for you?
@skeydan yes, it works, my mistake was to create new iterator in all sequential calls of dataloader_next()
.
But I still can't understand why example https://torchvision.mlverse.org/articles/examples/tinyimagenet-alexnet.html doesn't work:
[epoch 1]: Loss = 5.299575, Acc= 0.025040
[epoch 2]: Loss = 5.299323, Acc= 0.025040
[epoch 3]: Loss = 5.299293, Acc= 0.025040
[epoch 4]: Loss = 5.299235, Acc= 0.025040
With model_resnet18()
it looks something better but also confusing:
[epoch 1]: Loss = 4.568783, Acc= 0.064203
[epoch 2]: Loss = 3.636775, Acc= 0.065905
[epoch 3]: Loss = 3.179632, Acc= 0.079127
[epoch 4]: Loss = 2.820852, Acc= 0.080429
[epoch 5]: Loss = 2.472266, Acc= 0.087740
[epoch 6]: Loss = 2.094413, Acc= 0.088442
[epoch 7]: Loss = 1.662074, Acc= 0.084135
[epoch 8]: Loss = 1.191983, Acc= 0.078125
[epoch 9]: Loss = 0.746314, Acc= 0.083333
[epoch 10]: Loss = 0.440392, Acc= 0.082833
[epoch 11]: Loss = 0.281548, Acc= 0.075020
pred <- [torch_topk](https://rdrr.io/pkg/torch/man/torch_topk.html)(pred, k = 5, dim = 2, TRUE, TRUE)[[2]]$add(1L)
should be
pred <- [torch_topk](https://rdrr.io/pkg/torch/man/torch_topk.html)(pred, k = 5, dim = 2, TRUE, TRUE)[[2]]
Now model predicts right labels, but alexnet with optim_adam
still doesn't learn anything at least during first several epochs while resnet18 has top5-accuracy 0.37 after first epoch. optim_adagrad(model$parameters, lr = 0.005)
works much better.
# alexnet with optim_adam(model$parameters)
[epoch 1]: Loss = 5.299603, Acc= 0.025040
[epoch 2]: Loss = 5.299326, Acc= 0.025040
# alexnet with optim_adagrad(model$parameters, lr = 0.005)
[epoch 1]: Loss = 7.493342, Acc= 0.112881
[epoch 2]: Loss = 4.772662, Acc= 0.175381
Hi @statist-bhfz,
I can confirm your improvements work better than the original setting!
[epoch 1]: Loss = 6.557040, Acc= 0.105569
[epoch 2]: Loss = 4.818115, Acc= 0.162760
[epoch 3]: Loss = 4.612732, Acc= 0.209936
[epoch 4]: Loss = 4.457637, Acc= 0.241587
[epoch 5]: Loss = 4.362794, Acc= 0.255909
Merging your PR, thanks for the contribution!
Validation dataloader from the example tinyimagenet-alexnet always returns
1
:The same thing happens when iterating along test dataset, only train set contains correct labels.