jdermody / brightwire

Bright Wire is an open source machine learning library for .NET with GPU support (via CUDA)
https://github.com/jdermody/brightwire/wiki
MIT License
122 stars 18 forks source link

Object reference not set to an instance of an object #18

Closed Jorvdhoeven closed 5 years ago

Jorvdhoeven commented 6 years ago

Hi,

I'm fairly new to working with machine learning and I am attempting to train a network on on price data. However I keep getting an error that I cannot manage to solve. I used most of the code from the "Sequence to sequence with LSTM" tutorial and replaced the datasource with a .csv file containing some price-info.

My code is as following:

`// parse the CSV into a data table var dataTable = new StreamReader(@"C:\Resources\XBTEUR.csv").ParseCSV(';');

            var targetColumnIndex = 0;

            //Find the "Close" column index in the data 
            for (int i = 0; i < dataTable.ColumnCount; i++)
            {
                if (dataTable.Columns[i].Name == "Close")
                {
                    targetColumnIndex = i;
                }
            }

            dataTable.TargetColumnIndex = targetColumnIndex;
            var data = dataTable.Split(0);

            using (var lap = BrightWireProvider.CreateLinearAlgebra(false))
            {
                var graph = new GraphFactory(lap);
                var errorMetric = graph.ErrorMetric.BinaryClassification;

                // create the property set
                var propertySet = graph.CurrentPropertySet
                    .Use(graph.GradientDescent.RmsProp)
                    .Use(graph.WeightInitialisation.Xavier)
                ;

                // create the engine
                const float TRAINING_RATE = 0.1f;
                var trainingData = graph.CreateDataSource(data.Training);
                var testData = trainingData.CloneWith(data.Test);
                var engine = graph.CreateTrainingEngine(trainingData, TRAINING_RATE, 8);

                // build the network
                const int HIDDEN_LAYER_SIZE = 1024;
                var memory = new float[HIDDEN_LAYER_SIZE];
                var network = graph.Connect(engine)
                    .AddLstm(memory)
                    .AddFeedForward(engine.DataSource.OutputSize)
                    .Add(graph.SigmoidActivation())
                    .AddBackpropagation(errorMetric)
                ;

                engine.Train(40, testData, errorMetric);

                var networkGraph = engine.Graph;
                var executionEngine = graph.CreateEngine(networkGraph);

                var output = executionEngine.Execute(testData);
                Console.WriteLine(output.Average(o => o.CalculateError(errorMetric)));

            }`

All the data seems to be loading up fine, but when I get to the line engine.Train(40, testData, errorMetric);

I get the error "Object reference not set to an instance of an object.".

With the following StackTrace

" at BrightWire.ExecutionGraph.Node.Layer.FeedForward.ExecuteForward(IContext context)\r\n at BrightWire.ExecutionGraph.Node.NodeBase.ExecuteForward(IContext context, Int32 channel)\r\n at BrightWire.ExecutionGraph.Engine.Helper.TrainingEngineContext.ExecuteNext()\r\n at BrightWire.ExecutionGraph.Engine.TrainingEngine._Train(IExecutionContext executionContext, ILearningContext learningContext, IMiniBatchSequence sequence)\r\n at BrightWire.ExecutionGraph.Engine.TrainingEngine._Train(IExecutionContext executionContext, ILearningContext learningContext, IMiniBatch batch)\r\n at BrightWire.ExecutionGraph.Engine.TrainingEngine._Execute(IExecutionContext executionContext, IMiniBatch batch)\r\n at BrightWire.ExecutionGraph.Engine.TrainingEngine.<>c__DisplayClass10_0.<Execute>b__0(IMiniBatch mb)\r\n at BrightWire.ExecutionGraph.Helper.MiniBatchProvider.MiniBatchOperation.Execute(IExecutionContext executionContext)\r\n at BrightWire.ExecutionGraph.Engine.TrainingEngine.Execute(IDataSource dataSource, Int32 batchSize, Action1 batchCompleteCallback)\r\n at BrightWire.ExecutionGraph.Engine.TrainingEngine.Test(IDataSource testDataSource, IErrorMetric errorMetric, Int32 batchSize, Action1 batchCompleteCallback)\r\n at BrightWire.ExtensionMethods.Train(IGraphTrainingEngine engine, Int32 numIterations, IDataSource testData, IErrorMetric errorMetric, Action1 onImprovement, Int32 testCadence)\r\n at BrightWire_Test.LSTM_NeuralNet.LSTM_OHLCData() in c:\users\jordy\source\repos\BrightWire_Test\BrightWire_Test\LSTM_NeuralNet.cs:line 65"`

I hope anyone can point me in the right direction on this

jdermody commented 6 years ago

Try commenting out the AddLstm line and see what happens. LSTM layers expect a sequence of data that you will need to create. Loading data directly from a CSV will not give you such sequential data. The error message could be improved ; )

Jorvdhoeven commented 6 years ago

Hi Jack, Thanks for the pointer! I figured out how to cast the data with the builder, but now I'm at a loss again.

In your example for a many to one LSTM network, your dataset features a matrix and a vector. You fill the matrix with the whole encoded sequence and fill the vector with only the last vector of the matrix:

builder.Add(FloatMatrix.Create(list.ToArray()), list.Last());

What would be the reason for doing this, basically you the "output" column (list.last) is featured twice in your dataset?

Basically if I apply the same principle to my date it would look something like this:

FloatVector[] InputVectors = { OpenVector, HighVector, LowVector, VolumeVector, builder.Add(FloatMatrix.Create(InputVectors), CloseVector);

So, each row of the dataset containers the following: Matrix(Opens, Highs, Lows, Volume, Close) and a Vector(Close)

The only reason I could see in supplying the Close vector twice would be that the Vector(Close) would contain the shifted Close vector (by which I mean the vector of future closes).

If I feed the above dataset to a neural net with the following parameters:

'

using (var lap = BrightWireProvider.CreateLinearAlgebra(false)) { var graph = new GraphFactory(lap); var errorMetric = graph.ErrorMetric.BinaryClassification;

                // create the property set
                var propertySet = graph.CurrentPropertySet
                    .Use(graph.GradientDescent.RmsProp)
                    .Use(graph.WeightInitialisation.Xavier)
                ;

                // create the engine
                var trainingData = graph.CreateDataSource(TrainingData);
                var testData = trainingData.CloneWith(TestData);
                var engine = graph.CreateTrainingEngine(trainingData, learningRate: 0.03f, batchSize: 8);

                // build the network
                const int HIDDEN_LAYER_SIZE = 128;
                var memory = new float[HIDDEN_LAYER_SIZE];
                var network = graph.Connect(engine)
                    .AddFeedForward(HIDDEN_LAYER_SIZE)
                    .AddLstm(memory)
                    //.AddFeedForward(HIDDEN_LAYER_SIZE)
                    //.AddLstm(memory)
                    .Add(graph.ReluActivation())
                    .AddFeedForward(engine.DataSource.OutputSize)
                    .AddBackpropagationThroughTime(errorMetric)
                ;

                engine.Train(10, testData, errorMetric);

                var networkGraph = engine.Graph;
                var executionEngine = graph.CreateEngine(networkGraph);

                var output = executionEngine.Execute(testData);
                Console.WriteLine(output.Where(o => o.Target != null).Average(o => o.CalculateError(errorMetric)));
            }

'

And then

engine.Train(10, testData, errorMetric);

The network gets trained, but doesn't learn anything, the training-error and test-score is 0 on every epoch.

Also, upon running

var output = executionEngine.Execute(testData);

The output contains 5 floatvectors where as I would expect only one as the data was setup many-to-one.

I hope you can help me on this, I think I'm applying a logical workflow here (looking at examples from tensorflow) but I cannot grasp why I don't get the results I'm expecting.

So many questions :p, but it's really fun to be working with this stuff and your library is very intuitive to work with!

jdermody commented 6 years ago

Hey

Sorry for the delay in replying. You've found a problem with the sample code, so thanks for pointing it out!

As you say, the sample code doesn't make much sense. It should be as follows:

var grammar = new SequenceClassification(dictionarySize: 10, minSize: 5, maxSize: 5, noRepeat: true, isStochastic: false);
var sequences = grammar.GenerateSequences().Take(1000).ToList();
var builder = BrightWireProvider.CreateDataTableBuilder();
builder.AddColumn(ColumnType.Matrix, "Sequence");
builder.AddColumn(ColumnType.Vector, "Summary");

foreach (var sequence in sequences) {
    var list = new List<FloatVector>();
    var charSet = new HashSet<char>();
    foreach (var ch in sequence) {
        charSet.Add(ch);
        list.Add(grammar.Encode(ch));
    }

    var target = grammar.Encode(charSet.Select(ch2 => (ch2, 1f)));
    builder.Add(FloatMatrix.Create(list.ToArray()), target);
}
var data = builder.Build().Split(0);

In other words, what this sample code is doing is creating a sequence of random characters, adding each single character to the input sequence as one hot encoded vector and then training the network on the union set of those characters. So the network is learning to remember which characters are in the sequence and learns to output that union set. (Obviously a fairly simple exercise).

Your data will likely be more interesting, with a different target vector. For example, a simple example would just be a target vector of size 1 and the network learns to output 1 or 0 (good or bad or whatever) based on observing the sequence of input.

Good luck and sorry for the confusion!

Jorvdhoeven commented 6 years ago

Hi Jack,

Alright, I understand now how to fabricate the data to generate the correct input data.

What I still don't understand is two things:

I have tried a lot of learning-rates and varied the epochs, but always getting 0% test-score. Does this mean the network is not learning?

https://imgur.com/a/ewXQpUg

While the content of the trainingdata was as following:

var OpenVector = FloatVector.Create(Opens);
                    var HighVector = FloatVector.Create(Highs);
                    var LowVector = FloatVector.Create(Lows);
                    var VolumeVector = FloatVector.Create(Volumes);
                    var CloseVector = FloatVector.Create(Closes);
                    var FutureCloseVector = FloatVector.Create(FutureCloses);

                    FloatVector[] InputVectors = { OpenVector, HighVector, LowVector, VolumeVector, CloseVector };

                    builder.Add(FloatMatrix.Create(InputVectors), FutureCloseVector);

and I set the targetcolumn as following:

var data = builder.Build().Split(trainingPercentage: 0.1);

                var TestData = data.Training;
                var TrainingData = data.Test;

                TestData.TargetColumnIndex = 1;
                TrainingData.TargetColumnIndex = 1;

(I know I set the training as testdata and vice versa, but I want the network to train on the latest data, such that it trains on the latest data to have the best predictions)

I would expect the network to output only a single vector with its predictions.

The network also outputs 5 executionresults if I feed it only a single dataframe.

Sorry for all these questions :(, if you'd like I can share my dataset and code, perhaps it would make a good example for other people?

jdermody commented 6 years ago

Yes, please do share your code and dataset - it might make it easier to see what's happening!

Jorvdhoeven commented 6 years ago

I placed the code and dataset here:

https://wetransfer.com/downloads/e65da6872143d7b792e6a3023d92507920180606191636/51a0b14af2bb0bffa7c11ec69b8dbd2d20180606191637/18d877

You'll need to update the location of the .csv file in the code. I have it referring to a folder on my C drive

jdermody commented 6 years ago

Hey, I looked through your code and found a couple of small problems.

Firstly, the way you were initializing the input matrix didn't seem quite right. Each row in the matrix should be a feature vector in the sequence - I think you might have had the rows and columns reversed?

The second problem was that you were using the binary classification error metric. This special prebuilt error metric should only be used for binary classification - an output of 1 or 0. The reason you were seeing zero learning is that it was rounding everything to 1 (both the expected value and the prediction).

Below is some code that I adapted from your project. I used cross entropy error and bumped up the test percentage as 10% seemed a bit low. Tweaking the parameters and changing the normalization might give you better results.

What's the source of your dataset? Is it freely available/public domain? It might make a good addition to the sample code...

// load and normalise the data
var dataSet = new StreamReader(@"XBTEUR.csv").ParseCSV(';', true);
var columnNames = new[] {"Open", "High", "Low", "Volume", "Close"};
var columnsOfInterest = new HashSet<string>(columnNames);
var columnIndices = dataSet.Columns.Select((c, i) => new {Column = c, Index = i})
    .Where(c => columnsOfInterest.Contains(c.Column.Name))
    .OrderBy(c => Array.FindIndex(columnNames, n => c.Column.Name == n))
    .Select(c => c.Index)
    .ToList();
var analysis = dataSet.GetAnalysis();
var columnsMax = columnIndices.Select(ind => ((INumericColumnInfo)analysis[ind]).Max).ToList();
var rows = dataSet.Map(row => columnIndices.Select(row.GetField<float>).ToList());
var normalisedRows = rows.Select(row => row.Select((v, i) => Convert.ToSingle(v / columnsMax[i])).ToArray()).ToList();

// build the data table with a window of input data and the prediction as the following value
var builder = BrightWireProvider.CreateDataTableBuilder();
builder.AddColumn(ColumnType.Matrix, "Past");
builder.AddColumn(ColumnType.Vector, "Future");
const int PREDICTION_LENGTH = 30;
for (var i = PREDICTION_LENGTH + 1; i < rows.Count; i++)
{
    var inputVector = new List<FloatVector>();
    for (var j = i - PREDICTION_LENGTH - 1; j < i-1; j++)
        inputVector.Add(FloatVector.Create(normalisedRows[j]));
    var input = FloatMatrix.Create(inputVector.ToArray());
    var target = FloatVector.Create(normalisedRows[i]);
    builder.Add(input, target);
}

var data = builder.Build().Split(trainingPercentage: 0.2);
var TestData = data.Training;
var TrainingData = data.Test;

using (var lap = BrightWireProvider.CreateLinearAlgebra(false))
{
    var graph = new GraphFactory(lap);
    var errorMetric = graph.ErrorMetric.CrossEntropy;

    // create the property set
    var propertySet = graph.CurrentPropertySet
        .Use(graph.GradientDescent.Adam)
        .Use(graph.WeightInitialisation.Xavier);

    // create the engine
    var trainingData = graph.CreateDataSource(TrainingData);
    var testData = trainingData.CloneWith(TestData);
    var engine = graph.CreateTrainingEngine(trainingData, learningRate: 0.03f, batchSize: 128);

    // build the network
    const int HIDDEN_LAYER_SIZE = 128;
    var memory = new float[HIDDEN_LAYER_SIZE];
    var network = graph.Connect(engine)
        .AddLstm(memory)
        .AddFeedForward(engine.DataSource.OutputSize)
        .Add(graph.SigmoidActivation())
        .AddBackpropagationThroughTime(errorMetric);

    //Train the network
    engine.Train(20, testData, errorMetric);
}

image

Jorvdhoeven commented 6 years ago

Thanks for the help Jack!!

It's quite difficult to imagine the shape of the input matrix as it's like a matrix inside of a matrix so it's easy to switch columns and rows around I guess haha.

I generate the data from the public section of the Kraken API I generated CSV files with a bit more data and a couple more currency pairs. I uploaded it here So I think the data is freely available.

If you turn this into an example, could you add an example of how to use the trained network to make predictions?

jdermody commented 5 years ago

I created the a Stock data sample that uses a similar (but publicly available) dataset. This sample also shows how to train an LSTM recurrent neural network to predict stock prices and how to create an execution engine from the best model for subsequent classification.