Open Nurator opened 6 years ago
There are so few examples available of how to use addgradient and apply it to optimizers. Thanks very much for sharing!
I do not quite understand the question, as I do not know where you are getting stuck.
For your first question, I would need to know what is it that you tried, and did not work, and how did it fail?
When you talk about "feed them one by one", I do not know what is it that you are feeding. I have to guess.
I do not know what "reduceSum is 16 times higher than reduceMean", you are asking me to go and debug a problem but are not giving me enough information.
And I could use a full example, not just a snippet.
Ok sorry, I try to make my question more clear. Here is the complete working code
using System;
using TensorFlow;
namespace XOR
{
class Program
{
static void Main(string[] args)
{
XOR();
}
static void XOR() {
// Define the input and output of the XOR example, see the Python example at
// https://aimatters.wordpress.com/2016/01/16/solving-xor-with-a-neural-network-in-tensorflow/
double[,] xData = new double[,]
{
{0, 0},
{0, 1},
{1, 0},
{1, 1}
};
double[] yData = new double[]
{
0,
1,
1,
0
};
// Create a new TFSession to do anything
using (var session = new TFSession())
{
// Initialize a graph to build the neural network structure
var graph = session.Graph;
// Define the size of the input and output
var x = graph.VariableV2(new TFShape(1, 2), TFDataType.Double);
var y = graph.VariableV2(new TFShape(1, 1), TFDataType.Double);
// Define the unknown weights Theta and the biases of both layers
var Theta1 = graph.VariableV2(new TFShape(2, 2), TFDataType.Double);
var Theta2 = graph.VariableV2(new TFShape(2, 1), TFDataType.Double);
var Bias1 = graph.VariableV2(new TFShape(1, 2), TFDataType.Double);
var Bias2 = graph.VariableV2(new TFShape(1, 1), TFDataType.Double);
// Define the actual computation of the output prediction
var A2 = graph.Sigmoid(graph.Add(graph.MatMul(x, Theta1), Bias1));
var Prediction = graph.Sigmoid(graph.Add(graph.MatMul(A2, Theta2), Bias2));
// Define initializion of weights to random values and biases to 0
var initTheta1 = graph.Assign(Theta1, graph.RandomNormal(new TFShape(2, 2)));
var initTheta2 = graph.Assign(Theta2, graph.RandomNormal(new TFShape(2, 1)));
var initBias1 = graph.Assign(Bias1, graph.Const(new double[,] { { 0, 0 } }));
var initBias2 = graph.Assign(Bias2, graph.Const(new double[,] { { 0 } }));
// Define the cost function you want to minimize (cross entropy, MSE, etc.)
var firstcost = graph.Mul(y, graph.Log(Prediction));
var secondcost = graph.Mul(graph.Sub(graph.OnesLike(y), y), graph.Log(graph.Sub(graph.OnesLike(y), Prediction)));
var cost = graph.ReduceMean(graph.Neg(graph.Add(firstcost, secondcost)));
//var cost = graph.ReduceMean(graph.SquaredDifference(y, Prediction));
//var cost = graph.ReduceMean(graph.Abs(graph.Sub(y, Prediction)));
// Define the learning rate
var learning_rate = graph.Const(0.01);
// Define Compution of gradients of your cost function in respect to all learnable values in the network
var grad = graph.AddGradients(new TFOutput[] { cost }, new TFOutput[] { Theta1, Theta2, Bias1, Bias2 });
// Optimization works by applying gradient descent to all learnable values
// Make sure that the order matches with the AddGradients function!
var optimize = new[]
{
graph.ApplyGradientDescent(Theta1,learning_rate,grad[0]).Operation,
graph.ApplyGradientDescent(Theta2,learning_rate,grad[1]).Operation,
graph.ApplyGradientDescent(Bias1,learning_rate,grad[2]).Operation,
graph.ApplyGradientDescent(Bias2,learning_rate,grad[3]).Operation,
};
// After defining the graph, we actually initialize the values
session.GetRunner().AddTarget(initTheta1.Operation, initTheta2.Operation, initBias1.Operation, initBias2.Operation).Run();
// Run for enough epochs to get a good performance
for (var i = 0; i < 100000; i++)
{
for (var j = 0; j < 4; j++)
{
// Get each row of xData in Tensor form
var xDataFeed = new TFTensor(GetRowFrom2DArray(xData, j));
// Add the input and output data to the network one by one. Call Run() to call the optimize
// function and thus the gradient descent once
session.GetRunner()
.AddInput(x, xDataFeed)
.AddInput(y, yData[j])
.AddTarget(optimize).Run();
}
if (i % 10000 == 0)
{
// Every 10000 epochs, display the current prediction for all x values
// The Fetch command gets the current value of prediction and stores it in result[0]
var result = session.GetRunner()
.AddInput(x, xData)
.AddInput(y, yData)
.Fetch(Prediction).Run();
double[,] PredictArray = (double[,])result[0].GetValue();
Console.WriteLine("Prediction after {0} iterations:", i);
for (int j = 0; j < 4; j++)
{
// Display ground truth and prediction
Console.WriteLine("Expected: {0} Prediction: {1}", yData[j], PredictArray[j, 0]);
}
}
}
}
}
static double[,] GetRowFrom2DArray(double[,] sliceArray, int rowindex)
{
// Helper function to get the data slice out of xData
double[,] returnArray = new double[1, sliceArray.GetLength(1)];
for (int i = 0; i < sliceArray.GetLength(1); i++)
{
returnArray[0, i] = sliceArray[rowindex, i];
}
return returnArray;
}
}
}
As you can see here, in the "for i<100000 loop", I am giving each row of xData and yData as input one after another with help of the helper function "GetRowFrom2DArray". However, in the "i%10000==0" if case, I directly put the whole xData and yData as input and get the right cost back. Doing it as described works as intended. BUT, if I would use
session.GetRunner()
.AddInput(x, xData)
.AddInput(y, yData)
.AddTarget(optimize).Run();
directly, the network does not learn anything. In fact, the cost still decreases but for every input, the prediction becomes 0.5. Thus, it seems like AddInput is treating the complete xData in a wrong way, perhaps not using it row wise, but element wise.
Also, using
var cost = graph.ReduceSum(graph.Neg(graph.Add(firstcost, secondcost)));
instead of
var cost = graph.ReduceMean(graph.Neg(graph.Add(firstcost, secondcost)));
is 16 times as high, although there are only 4 elements to learn from.
I hope this made my problem more clear, thank you for your awesome work!
I notice if I change your definition of yData from
double[] yData = new double[] { 0, 1, 1, 0 };
to
double[,] yData = new double[,] { { 0 }, { 1 }, { 1 }, { 0 } };
it seems to work as expected, without having to feed the optimizer a row at a time.
I noticed it runs a lot faster, since you execute "optimize" once per iteration rather than 4 times, but it also converges more slowly. Comparing "cost" vs seconds spent optimizing for row-at-a-time and batch, it doesn't look like extracting rows is costing you much (This is running on CPU only.) Each sample point is 10,000 iterations.
Thanks for posting such a great example - I learned a lot from going through this! I didn't know the gradient methods were even exposed in the C interface.
Hi!
The last few days I struggled to implement a working XOR tutorial example as described in https://aimatters.wordpress.com/2016/01/16/solving-xor-with-a-neural-network-in-tensorflow/
Fortunately, I got it to work properly, so if anyone is interested, here is the code (feel free to use it as an example file in the code):
However, I struggled a long time with batch feeding the input. Why is it not possible to just use .AddInput(x, xData).AddInput(y, yData) for training? If I dont feed them one by one, the XOR output becomes 0.5 for each input, thus learning nothing. Also, reduceSum is 16 times higher than reduceMean, suggesting that maybe all entries of the 2D xData get used one by one and not row wise. Any idea, why this is not working? Or is this intended?