Closed mlearning closed 7 years ago
I can answer the second question here: one of the focuses of this paper is to do semi-supervised learning, which deals with the situation where we have a combination of labeled and unlabeled images (usually far fewer labeled than unlabeled).
Thanks you for answering my 2nd question!
Over the past weekend I was trying both the STL model and the inception v3 on the STL-100 dataset. I tried visit_weight between 0 and 0.25, sup_per_batch from 10 to 100, unsub_batch_size from 100 to 2000, got maximum of 65% of accuracy during evaluation. I used learning rate decay from 1e-4 to 1e-6 every 10,000 steps. Could you point me to a direction on which parameters I should focus on tuning?
This should converge to 81% test accuracy after 1M iterations (12h): visit_weight = 0.0 augmentation = true
I used 20 worker_replicas and 7 ps_tasks for asynchronous training.
The initial learning rate was 1e-5 and decayed to 6e-6 and 1e-6 after 300000 and 1000000 steps, respectively.
Hi,
Would you mind sharing your full list of hyper parameters for producing STL-10 data set result as shown in the paper? I was far from reproducing the result after tweaking a few parameters here and there as being used in other data sets.
And one more question: in the paper it was mentioned that you used only 100 images per label. May I know whether there is a particular reason why you don't take advantage of all the labelled images available?