Discussion: Densenet121 tuning result and future experimental plan

What

It is a result of the densenet121 hyperparameter tuning with our advanced options.

The thing I considered in this experiment

Using advanced options: ASL, random augmentation
It was not possible to use the label smoothing technique because this experiment started before fixing the conflict between ASL and the label smoothing
As many trials as possible
I tried 30 trials and it can not guarantee the tuning is optimal but still, this experiment took almost a week even I tried this with only the half of CheXpert dataset
Enough epochs
Empirically, I observed improvement in best score after 15 epochs. Thus, I tried 20 epochs but hard to to more due to the time limit

Experiment settings	Model	Loss	Raytune trials	stop_patience	Epoch	Train size	Optimizer	Dataset
Densenet121 (Imagenet pretrained)	ASL	30	10000 (not stop)	20	50%	Adam	CheXpert-pad224

Experiment result Link : https://wandb.ai/snuh_interns/kdg_tune_densenet121/groups/trainval_2023-01-27_08-17-57/workspace?workspace=user-snuh_interns

$$A\ highlightened\ top\ 9\ trials\ by\ parallel\ coordinates\ plot$$

$$A\ highlightened\ bottom\ 10\ trials\ by\ parallel\ coordinates\ plot$$

Key observations

High lr & low weight_decay might be a bad idea
- The performance is not sensitive to lr unless lr is too high. Empirically, higher than 1e-2 could be too high, and lower than 1e-3 seems adequate
- The weight_decay and batch_size look not a good thing to tune. The range of weight_decay is too wide (be cautious because the weight_decay axis is on the log scale) in the top 9. The batch_size, also, does not show distinct patterns
The results of ASL factors are in line with the intuition
- The range of gamma_neg of the top 9 is from around 2.2 to 4.6. It looks neither too low nor too high. Probably, 3-3.5 could be adequate for fixed gamma_neg. Higher than 4 seems not good because some low rank trials are with gamma_neg from 4 to 4.5
- The ps_factors (probability shifting factor) of the top 9 are from around 0.05 to 0.18. It is great that range of the top 9 is not out over 0.2 which could be suspiciously high. However, many low-rank trials are around 0.12. Thus, it is not clearer than asl_gamma_neg. If there are not enough resources to tune, removing ps_factor from the search space might be better
Hard to find a good combination of Random Augmentation's magnitude & # of operations, but at least avoiding too strong augmentation could be wise

To figure out moderate hyperparameters and get some hints for future experiments.

Hyperparameter search space
- lr : loguniform (0.00001, 0.1)
- weight_decay : loguniform (0.00001, 0.1)
- batch_size : categorical [256, 512]
- asl_gamma_neg : uniform (1, 5)
- asl_ps_factor : uniform (0.05, 0.25)
- ra_num_ops : randint (2, 14)
- ra_magnitude : randint (5, 20)
Hyperparamter search algorithm
- Algo : Hyperopt
- Metric : loss
- Mode : min
- Seed : 12345