jeffheaton / encog-dotnet-core

http://www.heatonresearch.com/encog
Other
430 stars 150 forks source link

Using a list as a datasource to train the NN #120

Closed JakobMoser closed 6 years ago

JakobMoser commented 6 years ago

I am trying to write a generic ML API that uses EnCog, but I don't want to use a CSV as a datasource but a list of arrays. The way I implemented it is the following:

    class CustomDataSet : IVersatileDataSource
    {

        private readonly IDictionary<string, int> _headerIndex = new Dictionary<string, int>();
        private readonly List<Object[]> data;
        private int lineCount = 0;

        public CustomDataSet(List<Object[]> list, string[] headers)
        {
            transformHeaders(headers);
            data = list;
        }

        private void transformHeaders(string[] head)
        {
            for(int i = 0; i < head.Length; i++)
            {
                _headerIndex.Add(head[i].ToLower(), i);
            }
        }

        public void Close()
        {
        }

        public int ColumnIndex(string name)
        {
            string name2 = name.ToLower();
            if (!_headerIndex.ContainsKey(name2))
            {
                return -1;
            }
            Console.WriteLine("requested: " + name + " | returned -> " + name2 + " " + _headerIndex[name2]);
            return -1; //_headerIndex[name2];
        }

        public string[] ReadLine()
        {
            if(data != null && lineCount < data.Count)
            {
                var result = new string[data.ElementAt(lineCount).Length];
                for (int j = 0; j < data.ElementAt(lineCount).Length; j++)
                {
                    result[j] = "" + data.ElementAt(lineCount)[j];
                    result[j] = result[j].Replace(",", ".");
                }
                Console.WriteLine(lineCount + ".  " + result[0] + " " + result[1] + " " + result[2] + " " + result[3] + " " + result[4] + " " + result[5] + " " + result[6] + " " + result[7]);
                lineCount++;
                return result;
            }

            return null;
        }

        public void Rewind()
        {
        }
    }

And I'm setting my data the following way

        var dataList = new List<Object[]>
        {
            new Object[]{ 18.0,8, 307.0, 130.0, 3504, 12.0,70,1 },
            new Object[]{ 15.0,8, 350.0, 165.0, 3693.0, 11.5,70,1 },
            new Object[]{ 18.0,8, 318.0, 150.0, 3436.0, 11.0,70,1 },
            new Object[]{ 16.0,8, 304.0, 150.0, 3433.0, 12.0,70,1 },
            ...
        }

        var headers = new String[] { "mpg", "cylinders", "displacement", "horsepower", "weight", "acceleration", "model_year", "origin" };

        CustomDataSet dataset = new CustomDataSet(dataList, headers);

        mlapi.SetDataSource(dataset);

But my NN predicts totally wrong data. But if I use the IVersatileDataSource with a CSV like in all the examples I get the right predictions, so I know it has to be a problem with the datasource.

Anybody ever had this problem, or has a solution?

jeffheaton commented 6 years ago

The core Encog neural network functions have nothing to do with CSV. The format that an Encog dataset must be in is a IMLDataSet interface format. Usually this is done through a BasicMLDataSet, which is really just a wrapper around a list. Very similar in format to what you have above.

Every use has a different way they want to represent data, so I provided helper classes for CSV, because it is very common. You just need to go a bit more low level if you want to deal with the data directly.

This example shows the basics how how to use these classes and directly send data to Encog:

https://github.com/encog/encog-dotnet-core/blob/master/ConsoleExamples/Examples/XOR/XORHelloWorld.cs

If you prefer using one of the CSV helper functions, you can have a look at its source code and see how it is communicating with the actual neural network. They are just thin wrappers.