GradientDescentOptimizer example is very sensitive to initial seed for A and learning rate

I've been getting inconsistent results with the Deming regression example given in your text. I'm running Tensorflow locally on an iMac using MacOS Sierra 10.12.4. I then explored using different combinations of initial values for A (rather than the random normal example in the text), and then different learning rates. I found that starting values for A that were negative often led to poor fits (lines with negative slopes and large intercepts, suggesting gradients that were moving in the wrong direction -- obviously diverging from what the raw data would otherwise suggest on inspection), even though the loss function (or Deming distance) improved throughout the optimization given sufficient iterations.

I understand the impact that learning rate has on convergence as well, but wondered whether you could suggest a good way to determine the initial value for the variables and the learn rate. Also, what the best way to determine if the optimization is achieving reasonable results? I've looked at examining the actual gradient calculations during iteration, but thought the must be a better way...

nfmcclure / tensorflow_cookbook

GradientDescentOptimizer example is very sensitive to initial seed for A and learning rate #76