Closed WhisperWS closed 10 months ago
Regarding the value of the total loss, it depends on the determined coefficient values and parameters for different losses. However, to make sense of the training process, I have attached the change of l2
loss per each epoch on PH2 and ISIC2018 datasets. (for both BatchSize: 32
and opt: Adam, lr=0.0001
)
The graph with magenta (tr
) and green (val
) lines with about 360 epochs is for ISIC2018, and the graph with blue (tr
) and red (val
) with 3K epochs is for PH2. HAM10000 is as same as this with fewer epochs.
To achieve good results with fewer parameters, you can also explore various configurations on your dataset. For instance, I have provided some example alternative configurations below that you can evaluate. You should fine-tune these configurations in conjunction with adjusting the training loss parameters. In my experiments with a size
of 128, it is essential to have a minimum of 128 channels for the x
in the primary layers.
Configs (cfg.model.params ) |
scenario 1 | scenario 2 | scenario 3 | scenario 4 | scenario ... |
---|---|---|---|---|---|
dim_x |
128 | 128 | 128 | 128 | >= 128 |
dim_g |
64 | 32 | 64 | 32 | >= 32 |
dim_x_mults |
[1,2,3,4,5,6] | [1,2,3,4,5,6] | [1,1,2,2,3,3] | [2,2,2,2,2,2] | ... |
dim_g_mults |
[1,2,4,8,16,32] | [1,2,4,8,16,32] | [1,2,3,4,5,6] | [1,2,4,6,10,16] | ... |
How many times have you trained to get good results and how small should the loss be?