Question about EDSR architecture

I have a question about EDSR network architecture.

I apply EDSR resblock on VDSR architecture.

Simply, all network architecture is almost same except 4 below features.

Existence of upsampling layer (e.g. sub-pixel), I remove it.
Input is first upscaled as output resolution by using matlab bicubic.
Modify VDSR residual input to EDSR residual input (e.g. EDSR use residual input as output of first convolution layer)
Number of residual block is 10, which has 20 + 2 convolution layer, each convolution has 64 features which is same as VDSR or EDSR baseline.

However, it doesn't well converge showing worse result than bicubic. I'm curious upsampling layer is important in EDSR style ResNet architecture.

I'm not sure there is flaw on my code, but I want to get some advice on your opinion.

Summary: Do you think EDSR ResNet architecture also apply well on VDSR style which doesn't use upsampling layer?

Thank you!

sanghyun-son / EDSR-PyTorch