yalickj / load-forecasting-resnet

short-term load forecasting with deep residual networks
MIT License
90 stars 31 forks source link

Doubts between the code and the paper #3

Closed laubapires closed 4 years ago

laubapires commented 5 years ago

Dear Kunjin Chen,

I was reading your paper 'Short-term Load Forecasting with Deep Residual Networks' and aiming to reproduce your results for a university assignment, but some doubts came to my mind when I was looking your code and comparing to what I was reading in the paper. I'm sorry if my doubts are a bit strange, I'm still learning ANN.

So, first of all, about the ResNet structure, it is said that "a total of 30 residual blocks are stacked, forming a 60-layer deep residual network"; however, in the code, there are only 4 blocks not truly stacked. So, from whereabouts does the number 30 comes from? And, how did you get to 60-layers?

Secondly, the structure for Fig 4 - ResNetPlus is not represented by the code either. The green side blocks are coming from the second (blue) main res block. What configuration should I pick?

And, lastly, I understood that the result obtained by the code should get the average value of Table III, but this is not achievable.

Therefore, I was re-writing your code I am willing to share with you, but a value equal or less than 1.447 (as was reported) is becoming quite difficult to get. Can you please help me?

Thanks!

yalickj commented 5 years ago
  1. Regarding the structure, you are right, the figure in the paper is slightly off the code, but the code is the right version. The side blocks are coming from one of the blue dots instead of the first green block on the main path.

  2. What result can you get? I re-ran the experiment with Keras 2.1.6 and Tensorflow 1.6.0, and I got 0.147, which is indeed higher than the result I got when I wrote the paper. I will try to roll back to the versions I used and see if there is anything that's causing this. Thank you so much for reporting this!

  3. As is in the paper, the structure for ISO-NE has 10 residual layers instead of 30. The parallel paths added in the residual blocks were not included in the paper for simplicity but this structure was used when running the experiments.

yalickj commented 4 years ago

Dear Kunjin Chen,

I was reading your paper 'Short-term Load Forecasting with Deep Residual Networks' and aiming to reproduce your results for a university assignment, but some doubts came to my mind when I was looking your code and comparing to what I was reading in the paper. I'm sorry if my doubts are a bit strange, I'm still learning ANN.

So, first of all, about the ResNet structure, it is said that "a total of 30 residual blocks are stacked, forming a 60-layer deep residual network"; however, in the code, there are only 4 blocks not truly stacked. So, from whereabouts does the number 30 comes from? And, how did you get to 60-layers?

Secondly, the structure for Fig 4 - ResNetPlus is not represented by the code either. The green side blocks are coming from the second (blue) main res block. What configuration should I pick?

And, lastly, I understood that the result obtained by the code should get the average value of Table III, but this is not achievable.

Therefore, I was re-writing your code I am willing to share with you, but a value equal or less than 1.447 (as was reported) is becoming quite difficult to get. Can you please help me?

Thanks!

I find it quite troublesome to roll back to previous versions, so I would like to provide some results that can be reproduced in more recent versions of Tensorflow and Keras. I changed the number of hidden units in the residual blocks from 20 to 72 and repeated the trail for 5 times, and I got: 0.01436563, 0.01491382, 0.01476847, 0.01463064, 0.01423384, whose average is 0.01458248. My take is that you can certainly try with different structures of the network as well as the structure of the residual block including hyper-parameters. But for me, I think the importance of this work is to provide a better baseline than a simple feed-forward neural network that can only achieve a metric that is greater than 2. Thanks again for pointing this out!