how to define the "converge" of the training loss

clelouch commented 3 years ago

Thanks for your code and paper. I notice that there is no validation set in the training stage. and the training process is stopeed when the loss converges. I am curious how to define the "converge" and avoid overfitting, since the loss may fluctuates.

xuebinqin commented 3 years ago

we were using msra2500k as the validation set. But it couldn't work. Since different datasets have different distribution and we found it is hard to overfit on DUTS-TR. You can plot the curve of the -log(loss) and observe the converging trend. Notes: Directly plotting the loss curve won't indicate the loss decrease in late stage because most of the loss decreasing comes from the fine structures, which are usually very tiny and unobservable in the direct loss plotting. Of course, if you are training the model with your own data, you can just add the validation step and also plot the -log(loss) to show the trend.

On Wed, Jun 16, 2021 at 11:25 AM clelouch @.***> wrote:

Thanks for your code and paper. I notice that there is no validation set in the training stage. and the training process is stopeed when the loss converges. I am curious how to define the "converge" and avoid overfitting, since the loss may fluctuates.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/xuebinqin/BASNet/issues/58, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADSGORKWFSOJ2VB4TC465DLTTBGVXANCNFSM46YZGFGA .

-- Xuebin Qin PhD Department of Computing Science University of Alberta, Edmonton, AB, Canada Homepage:https://webdocs.cs.ualberta.ca/~xuebin/

xuebinqin commented 3 years ago

Besides, a well-defined accuracy metric is suggested. Since loss doesn't always indicate the exact performance you want. Sometimes, validation loss may increase but it doesn't mean the model is overfitting. It also depends on your evaluation metrics.

On Wed, Jun 16, 2021 at 11:31 AM Xuebin Qin @.***> wrote:

we were using msra2500k as the validation set. But it couldn't work. Since different datasets have different distribution and we found it is hard to overfit on DUTS-TR. You can plot the curve of the -log(loss) and observe the converging trend. Notes: Directly plotting the loss curve won't indicate the loss decrease in late stage because most of the loss decreasing comes from the fine structures, which are usually very tiny and unobservable in the direct loss plotting. Of course, if you are training the model with your own data, you can just add the validation step and also plot the -log(loss) to show the trend.

On Wed, Jun 16, 2021 at 11:25 AM clelouch @.***> wrote:

Thanks for your code and paper. I notice that there is no validation set in the training stage. and the training process is stopeed when the loss converges. I am curious how to define the "converge" and avoid overfitting, since the loss may fluctuates.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/xuebinqin/BASNet/issues/58, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADSGORKWFSOJ2VB4TC465DLTTBGVXANCNFSM46YZGFGA .

-- Xuebin Qin PhD Department of Computing Science University of Alberta, Edmonton, AB, Canada Homepage:https://webdocs.cs.ualberta.ca/~xuebin/

-- Xuebin Qin PhD Department of Computing Science University of Alberta, Edmonton, AB, Canada Homepage:https://webdocs.cs.ualberta.ca/~xuebin/

clelouch commented 3 years ago

Thanks for your kind help. It seems that the TPAMI version Basnet reports much better performance compared with the CVPR version one. I guess the improvement can be attributed to the larger input size. Am I right?

xuebinqin commented 3 years ago

Yes, for all the models there are one or multiple optimal input resolution, which influences the receptive fields of different layers and leads to different performance. Since we are currently all somehow limited by the computation resources and time constraints. We usually set these configurations based on our experiences. It is hard to give theoretical explanations.

On Wed, Jun 16, 2021 at 11:34 AM clelouch @.***> wrote:

Thanks for your kind help. It seems that the TPAMI version Basnet reports much better performance compared with the CVPR version one. I guess the improvement can be attributed to the larger input size. Am I right?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/xuebinqin/BASNet/issues/58#issuecomment-862126587, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADSGORKIHRBEFMLJA5OK7ILTTBHZTANCNFSM46YZGFGA .

-- Xuebin Qin PhD Department of Computing Science University of Alberta, Edmonton, AB, Canada Homepage:https://webdocs.cs.ualberta.ca/~xuebin/

clelouch commented 3 years ago

I guess using larger images maintains more finer details while requires deeper network to obtain larger receptive field size. Consequently, we need a more powerful GPU to train the model. Maybe implement a much deeper network with group norm can solve the problem, as gn does not require large batch size.

xuebinqin / BASNet

how to define the "converge" of the training loss #58