szerintedmi commented 4 years ago

I'm trying to retrain your model for our specific use case. I'm training with images augmented from a 30k set. I also added accuracy calculations and validations.

The loss and accuracy seems to stall no matter how I change the learning rate. What would you recommend? Should I just train longer ? Shall I try to "freeze" or lower the LR on part of the layers (which layers? all encoders?). Or is that how far it can get? Have you experimented with different LR algos (Cyclic etc.)?

I ran these with 120k training images (50 epochs , 200 Iterations each, batch size 12 ). Validation size: 600 images after each epoch

Training from scratch

Training on your pre-trained model (173.6 MB) LR=0.001 (as yours)

LR reduced to 0.0001 (on pre-trained model)

xuebinqin commented 4 years ago

What would you recommend? Should I just train longer ? Or is that how far it can get?

RES: You have 30k images (with 120k augmented), which is bigger than our training set(around 20K augmented from 10K). It needs larger capacity models I guess(I suggest to train your model from scratch not start from our pre-trained weights). You can think about increasing the filter numbers of the network. In our current version model, we use 6 side outputs for reducing the overfitting. You can also try to disable some or all of the side outputs supervision to increase the model capacity. You may have to first make sure that the model is able to overfit (or approximately overfit) your training set. We trained our model 600K iterations (batch size of 12, 20K images). I think you can try to train longer also. BTW, I am not sure which exact accuracy measure are you using. But I suggest to use IoU or F-measure to evaluate the segmentation performance. In addition, you validation set is too small compared with your training set. I think that's why your validation losses are even smaller than training losses.

Have you experimented with different LR algos (Cyclic etc.)?

RES: We didn't test that much on LR. We use the adam optimizer with default settings.

Best of Luck!

On Sun, May 17, 2020 at 2:23 AM Peter Petrovics notifications@github.com wrote:

I'm trying to retrain your model for our specific use case. I'm training with images augmented from a 30k set. I also added accuracy calculations and validations.

The loss and accuracy seems to stall no matter how I change the learning rate. What would you recommend? Should I just train longer ? Or is that how far it can get? Have you experimented with different LR algos (Cyclic etc.)?

I ran these with 120k training images (50 epochs , 200 Iterations each, batch size 12 ). Validation size: 600 images after each epoch Training from scratch

[image: image] https://user-images.githubusercontent.com/7456451/82139227-4f97a300-981e-11ea-93fd-64109911391b.png Training on your pre-trained model (173.6 MB) LR=0.001 (as yours)

[image: image] https://user-images.githubusercontent.com/7456451/82139394-5541b880-981f-11ea-9f09-f795bf7e9bcc.png LR reduced to 0.0001 (on pre-trained model)

[image: image] https://user-images.githubusercontent.com/7456451/82139404-730f1d80-981f-11ea-9662-1f5a6324545c.png

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/NathanUA/U-2-Net/issues/21, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADSGORKMOWE5XMNK5BTV2SLRR6NGTANCNFSM4NDI5UFQ .

-- Xuebin Qin PhD Department of Computing Science University of Alberta, Edmonton, AB, Canada Homepage:https://webdocs.cs.ualberta.ca/~xuebin/

szerintedmi commented 4 years ago

@NathanUA , thank you for taking the time to answer. Very useful tips, much appreciated.

You can think about increasing the filter numbers of the network.

Do you mean increasing "M" ( the channels in the internal layers of RSUs)? Or increase the number of layers ("L")?

In our current version model, we use 6 side outputs for reducing the overfitting. You can also try to disable some or all of the side outputs supervision to increase the model capacity.

Just to double check my understanding: you suggest to use only the BCE of the last side output as a loss function for the training? (in your train code it's called loss2 or tar). Are you solely using this fused BCE loss instead of last output loss to avoid overfitting ? In that case it totally make sense to me to use only d0 loss for training as long we don't overfit. It might be trivial but I can't get my head around why it would increase model capacity. Btw what you mean by "model capacity" ? :-)

BTW, I am not sure which exact accuracy measure are you using. But I suggest to use IoU or F-measure to evaluate the segmentation performance.

Used MAE but good call, we will try IoU and F-measure

(I suggest to train your model from scratch not start from our pre-trained weights).

So far our results are getting better much quicker when we start from your pretrained weights (as you can see from the graphs above). Do you think this trend would change if we train from scratch for longer?

In addition, your validation set is too small compared with your training set. I think that's why your validation losses are even smaller than training losses.

Indeed, already increased to 2000 validation samples. Will play around if we need more

RES: We didn't test that much on LR. We use the adam optimizer with default settings.

I just tried to lower the LR because it seemed the loss is oscillating. But the training might not been long enough to see the trend. Happy to share our findings if you are interested

szerintedmi commented 4 years ago

btw, here are the results from 100 iterations (~240k samples) and 2000 validation samples per iteration.

xuebinqin commented 4 years ago

RES: You can increase either M or the output filter numbers of each RSU as well as the number of layers “L”, when you use bigger size input than 320x320.

Just to double check my understanding: you suggest to use only the BCE of the last side output as a loss function for the training? (in your train code it's called loss2 or tar). Are you solely using this fused BCE loss instead of last output loss to avoid overfitting ? In that case it totally make sense to me to use only d0 loss for training as long we don't overfit.

RES: I used the summation of all the side output losses and the fusion loss. Yes, you can only use the fusion loss, since it seems that the model is under fitting on your dataset in terms of MAE measure ( as you mentioned your next eamil).

It might be trivial but I can't get my head around why it would increase model capacity. Btw what you mean by "model capacity" ? :-)

RES: For example, you have just one image and its ground truth masks. You can able to train a small model lets say it contains 5 layers and each layer has 5 filters. This small model is likely to fit your data and make your training error to 0, which is typically overfitting. But if you have 10M images and the ground truth masks. Does this small model is able to fit that. Absolutely NO. The capacity is actually the representation capability of a model. Usually, models with deeper layers and more filter numbers have higher capacity.

So far our results are getting better much quicker when we start from your pretrained weights (as you can see from the graphs above). Do you think this trend would change if we train from scratch for longer?

RES: it depends. If you dataset is ’similar’ to the duts-tr datasets. I think training start with our pre-trained model is fine. Otherwise, training from scratch may give you better results.

Best of luck！

On May 20, 2020, at 7:20 AM, Peter Petrovics notifications@github.com wrote:

@NathanUA https://github.com/NathanUA , thank you for takng the time to answer. Very useful tips, much appreciated.

You can think about increasing the filter numbers of the network.

Do you mean increasing "M" ( the channels in the internal layers of RSUs)? Or increase the number of layers ("L")?

In our current version model, we use 6 side outputs for reducing the overfitting. You can also try to disable some or all of the side outputs supervision to increase the model capacity.

Just to double check my understanding: you suggest to use only the BCE of the last side output as a loss function for the training? (in your train code it's called loss2 or tar). Are you solely using this fused BCE loss instead of last output loss to avoid overfitting ? In that case it totally make sense to me to use only d0 loss for training as long we don't overfit. It might be trivial but I can't get my head around why it would increase model capacity. Btw what you mean by "model capacity" ? :-)

BTW, I am not sure which exact accuracy measure are you using. But I suggest to use IoU or F-measure to evaluate the segmentation performance.

Used MAE but good call, we will try IoU and F-measure

(I suggest to train your model from scratch not start from our pre-trained weights).

So far our results are getting better much quicker when we start from your pretrained weights (as you can see from the graphs above). Do you think this trend would change if we train from scratch for longer?

In addition, your validation set is too small compared with your training set. I think that's why your validation losses are even smaller than training losses.

Indeed, already increased to 2000 validation samples. Will play around if we need more

RES: We didn't test that much on LR. We use the adam optimizer with default settings.

I just tried to lower the LR because it seemed the loss is oscillating. But the training might not been long enough to see the trend. Happy to share our findings if you are interested

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/NathanUA/U-2-Net/issues/21#issuecomment-631469022, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADSGORKSJLFDFVLHSMTVRCDRSPKJHANCNFSM4NDI5UFQ.

szerintedmi commented 4 years ago

thanks @NathanUA !

We run some trainings optimizing to d0 loss ( fusion loss without the side output losses). I didn't bring any significant improvement. Qualitatively it seems the same as with your original loss. UPDATE: just run a small test on 300 samples to see loss vs. loss2 and the correlation seems pretty strong, so no surprise changing the loss fx didn't make a noticable difference

You mentioned you trained 120 hours with 10k images augmented to 20k. We trained with 30k images augmented to 240k. It finishes in 9-10hrs on a single GPU Colab instance. Why is so much faster for us? Have you fed the same images multiple times during the training? If so why?

ohheysherry66 commented 4 years ago

I'm trying to retrain your model for our specific use case. I'm training with images augmented from a 30k set. I also added accuracy calculations and validations.

The loss and accuracy seems to stall no matter how I change the learning rate. What would you recommend? Should I just train longer ? Shall I try to "freeze" or lower the LR on part of the layers (which layers? all encoders?). Or is that how far it can get? Have you experimented with different LR algos (Cyclic etc.)?

I ran these with 120k training images (50 epochs , 200 Iterations each, batch size 12 ). Validation size: 600 images after each epoch

Training from scratch

Training on your pre-trained model (173.6 MB) LR=0.001 (as yours)

LR reduced to 0.0001 (on pre-trained model)

Hi, could you please tell me where to add the validate process,it seems change a lot ,thankyou.

xuebinqin commented 4 years ago

you can first define a testing dataloader just after the training dataloader and then feed the testing data with in the if ite_num % save_frq == 0: by a for loop just like for i, data in enumerate(salobj_dataloader):. before the for loop of testing you need to change the net mode to evaluation by net.eval() and change that back to training mode by net.train() after the validation.

On Fri, Jul 17, 2020 at 1:10 AM ohheysherry66 notifications@github.com wrote:

I'm trying to retrain your model for our specific use case. I'm training with images augmented from a 30k set. I also added accuracy calculations and validations.

The loss and accuracy seems to stall no matter how I change the learning rate. What would you recommend? Should I just train longer ? Shall I try to "freeze" or lower the LR on part of the layers (which layers? all encoders?). Or is that how far it can get? Have you experimented with different LR algos (Cyclic etc.)?

I ran these with 120k training images (50 epochs , 200 Iterations each, batch size 12 ). Validation size: 600 images after each epoch Training from scratch

[image: image] https://user-images.githubusercontent.com/7456451/82139227-4f97a300-981e-11ea-93fd-64109911391b.png Training on your pre-trained model (173.6 MB) LR=0.001 (as yours)

[image: image] https://user-images.githubusercontent.com/7456451/82139394-5541b880-981f-11ea-9f09-f795bf7e9bcc.png LR reduced to 0.0001 (on pre-trained model)

[image: image] https://user-images.githubusercontent.com/7456451/82139404-730f1d80-981f-11ea-9662-1f5a6324545c.png

Hi, could you please tell me where to add the validate process,it seems change a lot ,thankyou.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/NathanUA/U-2-Net/issues/21#issuecomment-659912302, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADSGOROZ22JSNX3OMUR5HBDR372P5ANCNFSM4NDI5UFQ .

-- Xuebin Qin PhD Department of Computing Science University of Alberta, Edmonton, AB, Canada Homepage:https://webdocs.cs.ualberta.ca/~xuebin/

ohheysherry66 commented 4 years ago

you can first define a testing dataloader just after the training dataloader and then feed the testing data with in the if ite_num % save_frq == 0: by a for loop just like for i, data in enumerate(salobj_dataloader):. before the for loop of testing you need to change the net mode to evaluation by net.eval() and change that back to training mode by net.train() after the validation. … On Fri, Jul 17, 2020 at 1:10 AM ohheysherry66 @.***> wrote: I'm trying to retrain your model for our specific use case. I'm training with images augmented from a 30k set. I also added accuracy calculations and validations. The loss and accuracy seems to stall no matter how I change the learning rate. What would you recommend? Should I just train longer ? Shall I try to "freeze" or lower the LR on part of the layers (which layers? all encoders?). Or is that how far it can get? Have you experimented with different LR algos (Cyclic etc.)? I ran these with 120k training images (50 epochs , 200 Iterations each, batch size 12 ). Validation size: 600 images after each epoch Training from scratch [image: image] https://user-images.githubusercontent.com/7456451/82139227-4f97a300-981e-11ea-93fd-64109911391b.png Training on your pre-trained model (173.6 MB) LR=0.001 (as yours) [image: image] https://user-images.githubusercontent.com/7456451/82139394-5541b880-981f-11ea-9f09-f795bf7e9bcc.png LR reduced to 0.0001 (on pre-trained model) [image: image] https://user-images.githubusercontent.com/7456451/82139404-730f1d80-981f-11ea-9662-1f5a6324545c.png Hi, could you please tell me where to add the validate process,it seems change a lot ,thankyou. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#21 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADSGOROZ22JSNX3OMUR5HBDR372P5ANCNFSM4NDI5UFQ . -- Xuebin Qin PhD Department of Computing Science University of Alberta, Edmonton, AB, Canada Homepage:https://webdocs.cs.ualberta.ca/~xuebin/

Thankyou,big help!

xuebinqin / U-2-Net

Loss & accuracy #21

Training from scratch

Training on your pre-trained model (173.6 MB) LR=0.001 (as yours)

LR reduced to 0.0001 (on pre-trained model)

Training from scratch

Training on your pre-trained model (173.6 MB) LR=0.001 (as yours)

LR reduced to 0.0001 (on pre-trained model)