fix bug in ch3/linear_regression_tf.py

matroid / dlwithtf

TensorFlow for Deep Learning Book

http://shop.oreilly.com/product/0636920065869.do

321 stars 160 forks source link

fix bug in ch3/linear_regression_tf.py #9

Open discoverkl opened 6 years ago

discoverkl commented 6 years ago

I think there may be a bug due to brordcasting rule. Please take a look.

rbharath commented 6 years ago

Thanks for the PR! I think you might be right about the broadcasting error... I'll try rerunning this code with the fix to double check on my end.

RAvontuur commented 6 years ago

With the above fix, the system converges to the right solution. After setting the noise to zero, and initialize the system with w=5 and b=2, the loss is now zero (as expected), and not some high positive value as it was before the fix.

Because of this, the following explanation in the book is not correct and requires an update: 'What happened on this system? Why didn’t TensorFlow learn the correct function despite being trained to convergence? This example provides a good illustration of one of the weaknesses of gradient descent algorithms. There is no guarantee of finding the true solution! The gradient descent algorithm can get trapped in local minima. That is, it can find solutions that look good, but are not in fact the lowest minima of the loss function'

Please, add a correct update of this explanation as a comment to the code. This makes it better understandable for future readers.

rbharath commented 6 years ago

Thanks for the feedback here. We'll make sure to fix this bug in a future printing of the book.

As a quick note, the explanation isn't wrong though. It's entirely common to see instability in training more complex models. It turns our the behavior on this linear system is in fact stable after this bugfix, but there are a number of unstable nonlinear systems you will encounter in practice. We will add a note to explain this.

hamelsmu commented 6 years ago

You can also fix this bug by merging https://github.com/matroid/dlwithtf/pull/17

I agree this is confusing / misleading for readers. When reading the book, I was very skeptical of this model not converging and set out to debug the model. I noticed if you keep training the model the learned slope goes to zero (a flat line), which I found quite odd and gave me intuition that somehow the loss function was ill defined, because I noticed the loss decreasing even though the visualization of the learned model kept looking worse.

One idea is you could demonstrate how to use tf eager on how to debug this situation in the book, which is a useful thing to learn.

cc: @hohsiangwu @ankushagarwal

rbharath commented 6 years ago

@hamelsmu Good suggestion! We will add a section on debugging this model in the next edition of this book. Our apologies again for letting this error slip through review