HAN training is unstable

smiler96 commented 3 years ago

when i training your han model, i found the loss exploded and model collapsed! Have your met this？or can you give me some guidence?

Senwang98 commented 3 years ago

Hi, @smiler96 I use the command provided by author(using pre-trained RCAN model) and meet the same problem. So, you use pre-trained RCAN.pt? It seems that train the whole model from the scratch will be ok. Have you solved this problem? the owner seems to give up this repo.

smiler96 commented 3 years ago

Hi, @smiler96 I use the command provided by author(using pre-trained RCAN model) and meet the same problem. So, you use pre-trained RCAN.pt? It seems that train the whole model from the scratch will be ok. Have you solved this problem? the owner seems to give up this repo.

No pretrained model used when I reimplemented HAN，you can find it in my github repo. For this issue i solved it with global residual connection. I think you can try it.

Senwang98 commented 3 years ago

thanks @smiler96 , but have you trained the HAN? how about its final result on benchmark? I merge HAN into EDSR-pytorch repo(because my GPU can't support cuda8) and the previous 20 epoch don't meet unstable problem.

HAN use long residual connection as well, I want to know what's difference between your method and the model owner provided, because It looks the same except for another long residual connection

smiler96 commented 3 years ago

thanks @smiler96 , but have you trained the HAN? how about its final result on benchmark? I merge HAN into EDSR-pytorch repo(because my GPU can't support cuda8) and the previous 20 epoch don't meet unstable problem.

HAN use long residual connection as well, I want to know what's difference between your method and the model owner provided, because It looks the same except for another long residual connection

I remember that the first several training epochs of the vanilla HAN were stable as the above fig showing. But I have not figured out the issue why the loss exploded. The same HAN except the long residual connection is in my repo. I have not calculated the PSNR values of each methods, sorry about that.

Senwang98 commented 3 years ago

Ok, also thanks for your reply!

yudadabing commented 3 years ago

Hi, @smiler96 I use the command provided by author(using pre-trained RCAN model) and meet the same problem. So, you use pre-trained RCAN.pt? It seems that train the whole model from the scratch will be ok. Have you solved this problem? the owner seems to give up this repo.

No pretrained model used when I reimplemented HAN，you can find it in my github repo. For this issue i solved it with global residual connection. I think you can try it.

Hi, @smiler96, I also meet the unstable in the train process. you speak a global residual connection can resolve it . I want to know the difference between your repo and the model owner provided, i can not find the specific operation in your repo.

smiler96 commented 3 years ago

Hi, @smiler96 I use the command provided by author(using pre-trained RCAN model) and meet the same problem. So, you use pre-trained RCAN.pt? It seems that train the whole model from the scratch will be ok. Have you solved this problem? the owner seems to give up this repo.

No pretrained model used when I reimplemented HAN，you can find it in my github repo. For this issue i solved it with global residual connection. I think you can try it.

Hi, @smiler96, I also meet the unstable in the train process. you speak a global residual connection can resolve it . I want to know the difference between your repo and the model owner provided, i can not find the specific operation in your repo.

global_res=True

yudadabing commented 3 years ago

global_res=True

OK，thanks for your reply.

aatiqa-ghazali commented 2 years ago

I did not faced that issue .May be i have turned gradient clipping on in 'options.py' file that's why.

Dannyxu1031 commented 1 year ago

I did not faced that issue .May be i have turned gradient clipping on in 'options.py' file that's why.

Hi,how did you set the '--gclip' in 'options.py'?

WelcomeToWonderland commented 1 year ago

Hi, @smiler96 I use the command provided by author(using pre-trained RCAN model) and meet the same problem. So, you use pre-trained RCAN.pt? It seems that train the whole model from the scratch will be ok. Have you solved this problem? the owner seems to give up this repo.

Hi, I am finding pre-trained RCAN model. Could you do me a favour to tell me how find pre-trianed RCAN model(or just give me a link). Thank you.

smiler96 commented 1 year ago

Hi, @smiler96 I use the command provided by author(using pre-trained RCAN model) and meet the same problem. So, you use pre-trained RCAN.pt? It seems that train the whole model from the scratch will be ok. Have you solved this problem? the owner seems to give up this repo.

Hi, I am finding pre-trained RCAN model. Could you do me a favour to tell me how find pre-trianed RCAN model(or just give me a link). Thank you.

hi, you can refer to the repo https://github.com/smiler96/Image-Super-Resolution

wwlCape / HAN

HAN training is unstable #2