microsoft / CNTK

Microsoft Cognitive Toolkit (CNTK), an open source deep-learning toolkit
https://docs.microsoft.com/cognitive-toolkit/
Other
17.52k stars 4.28k forks source link

C# LSTM with Sparse Data Evaluation #3641

Closed GuntaButya closed 5 years ago

GuntaButya commented 5 years ago

I am having a problem, not sure if its a Bug or my Learning Curve.

I am using the ATIS Dataset, provided here: https://github.com/Microsoft/CNTK/tree/master/Examples/LanguageUnderstanding/ATIS/Data

My code is trying to copy the Model arrangement from the Python example here: https://github.com/Microsoft/CNTK/blob/master/Examples/LanguageUnderstanding/ATIS/Python/LanguageUnderstanding.py

My Code, mostly copied from the examples:

    /// <summary>
    /// The Epochs, number of times the Training will run.
    /// </summary>
    public int Epochs { get; set; }

    /// <summary>
    /// The Word To Index Dictionary.
    /// </summary>
    Dictionary<string, int> WordToIndex { get; set; }

    /// <summary>
    /// The Index To Word Dictionary.
    /// </summary>
    Dictionary<int, string> IndexToWord { get; set; }

    /// <summary>
    /// The Labels.
    /// </summary>
    Dictionary<int, string> Label { get; set; }

Loading the Dataset Data into Dictionary's:

    /// <summary>
    /// Loads the Brainscript Data for Input to Output Mapping.
    /// </summary>
    void LoadData()
    {

        // Load the Scripts:
        var vocab = File.ReadAllLines("Data/Brainscripts/query.wl"); // ATIS.vocab");
        var labels = File.ReadAllLines("Data/Brainscripts/slots.wl"); // ATIS.label");

        // Init the Dictionarys:
        WordToIndex = new Dictionary<string, int>();
        IndexToWord = new Dictionary<int, string>();
        Label = new Dictionary<int, string>();

        // Populate the Dictionarys:
        for (int i = 0; i < vocab.Length; i++)
        {
            WordToIndex.Add(vocab[i], i);
            IndexToWord.Add(i, vocab[i]);
        }

        // Populate the Dictionary:
        for (int i = 0; i < labels.Length; i++)
        {
            Label.Add(i, labels[i]);
        }
    }

Build and Train the Model:

    public async void BuildModel()
    {

        // Model and Dataset Parameters:
        const int inputDim = 943;
        const int cellDim = 125;
        const int hiddenDim = 300;
        const int embeddingDim = 150;
        const int numOutputClasses = 129;

        // Set Variable Names:
        string OutputLabelsName = "OutputLabels";
        string InputFeaturesName = "InputFeatures";

        // Init Model Feature/Label Variables:
        Variable InputFeatures = Variable.InputVariable(new int[] { inputDim }, DataType.Float, InputFeaturesName, null, false /*isSparse*/);
        Variable OutputLabels = Variable.InputVariable(new int[] { numOutputClasses }, DataType.Float, OutputLabelsName, null, true /*isSparse*/);

        // Init a LSTM Sequence Model:
        Function Model = Layers.LSTMLayer(InputFeatures, numOutputClasses, embeddingDim, hiddenDim, cellDim, Device, "model");

        // Configure Loss Functions:
        Function loss = CNTKLib.CrossEntropyWithSoftmax(Model, OutputLabels, "lossFunction");
        Function prediction = CNTKLib.ClassificationError(Model, OutputLabels, "classificationError");

        // Configure Data Streams to Variables:
        IList<StreamConfiguration> streamConfigurations = new StreamConfiguration[]
        {
            new StreamConfiguration(InputFeaturesName, inputDim, true, "S0"),
            // new StreamConfiguration(InputFeaturesName, inputDim, true, "S1"), NOT USED.
            new StreamConfiguration(OutputLabelsName, numOutputClasses, true, "S2")
        };

        // Set Data Paths to the Static Data Files:
        string FileName = "atis.train.ctf";
        string DataPath = Path.Combine(AppDomain.CurrentDomain.BaseDirectory, "Data");
        DataPath = Path.Combine(DataPath, FileName);

        // Ensure we have a valid Path:
        if (!File.Exists(DataPath))
            throw new FileNotFoundException("Cant find the file: " + DataPath);

        // Configure the MiniBatch Text Format Reader and Streams:
        MinibatchSource minibatchSource = MinibatchSource.TextFormatMinibatchSource(DataPath, streamConfigurations, MinibatchSource.InfinitelyRepeat, true);
        StreamInformation featureStreamInfo = minibatchSource.StreamInfo(InputFeaturesName);
        StreamInformation labelStreamInfo = minibatchSource.StreamInfo(OutputLabelsName);

        // Hyperparameters:
        Epochs = 1000;
        uint minibatchSize = 200;

        // Training Hyperparameters:
        TrainingParameterScheduleDouble learningRate = new TrainingParameterScheduleDouble(0.001, minibatchSize);
        TrainingParameterScheduleDouble momentum = CNTKLib.MomentumAsTimeConstantSchedule(256);

        // Configure Learners:
        IList<Learner> learners = new List<Learner>() { Learner.MomentumSGDLearner(Model.Parameters(), learningRate, momentum, true /*unitGainMomentum = */) };

        // Configure a Trainer:
        Trainer trainer = Trainer.CreateTrainer(Model, loss, prediction, learners);

        // The Thread on which Training can be done:
        await Task.Run(() =>
        {

            // Get the next MiniBatch:
            UnorderedMapStreamInformationMinibatchData minibatchData = minibatchSource.GetNextMinibatch(minibatchSize, Device);

            // Stream each Source into the Features or Labels Variables as Arguments:
            Dictionary<Variable, MinibatchData> arguments = new Dictionary<Variable, MinibatchData>
            {
                { InputFeatures, minibatchData[featureStreamInfo] },
                { OutputLabels, minibatchData[labelStreamInfo] }
            };

            // Train over X Epochs:
            for (int i = 0; i < Epochs; i++)
            {

                // Do the Training:
                trainer.TrainMinibatch(arguments, Device);

                // Invoke the Event with Training Progress:
                if ((i % 7) == 0 && trainer.PreviousMinibatchSampleCount() != 0)
                    OnOutputReveivedEvent?.Invoke($"Minibatch: {i} CrossEntropyLoss = {trainer.PreviousMinibatchLossAverage()}, EvaluationCriterion = {trainer.PreviousMinibatchEvaluationAverage()}");
            }
        });

        // The Text to Evaluate:
        string text = "BOS is there a delta flight from denver to san francisco EOS";

        // Split the text:
        string[] Split = text.Split(' ');

        foreach (string Word in Split)
        {
            // Run Evaluation on the Word:
            EvaluateModel(Model, Word);
        }
    }

At Evaluation time:

    void EvaluateModel(Function model, string text)
    {

        // Create the Input Value:
        Value InputValue = Value.CreateBatch(model.Arguments.Single().Shape, GetSparse(text), Device);

        // Create Input Dictionary Pair:
        Dictionary<Variable, Value> ModelInput = new Dictionary<Variable, Value>
        {
            { model.Arguments.Single(), InputValue }
        };

        // Get the Model's Output Variable.
        // You can also use the following way to get Output Variable by name: model.Outputs.Where(variable => string.Equals(variable.Name, outputName)).Single();
        Variable OutputVariable = model.Output;

        // Create Output Dictionary Pair:
        Dictionary<Variable, Value> ModelOutput = new Dictionary<Variable, Value>
        {
            { OutputVariable, null }
        };

        // Evaluate the Model using the Device:
        model.Evaluate(inputs: ModelInput, outputs: ModelOutput, computeDevice: Device);

        // Get evaluate result as dense output
        IList<IList<float>> OutputValue = ModelOutput[OutputVariable].GetDenseData<float>(OutputVariable);

        IList<float> t = OutputValue[0];
        int index = t.IndexOf(t.Max());

        StringBuilder stringBuilder = new StringBuilder();
        stringBuilder.Append($" Word: {text}  Label: {Label[index]} ({index})\r\n");

        // Invoke Event:
        OnOutputReveivedEvent?.Invoke(stringBuilder.ToString());
    }

The Sparse Tensor:

    System.Numerics.Tensors.SparseTensor<float> GetSparse(string text)
    {

        // System.Numerics
        System.Numerics.Tensors.SparseTensor<float> SparseTensor = new System.Numerics.Tensors.SparseTensor<float>(new int[] { 943 }, true, 1);

        // Assuming text is a single word and its in the Dictionary:
        int index = WordToIndex[text];
        SparseTensor[index] = 1;

        return SparseTensor;
    }

And, the Layers Class, idea inline with python Layers Class:

public class Layers
{

    public static Function Dense(Variable input, int outputDim, DeviceDescriptor device, Activation activation = Activation.None, string outputName = "")
    {

        if (input.Shape.Rank != 1)
        {
            // 
            int newDim = input.Shape.Dimensions.Aggregate((d1, d2) => d1 * d2);
            input = CNTKLib.Reshape(input, new int[] { newDim });
        }

        Function fullyConnected = FullyConnectedLinearLayer(input, outputDim, device, outputName);

        switch (activation)
        {
            case Activation.ELU:
                return CNTKLib.ELU(fullyConnected);
            case Activation.Hardmax:
                return CNTKLib.Hardmax(fullyConnected);
            case Activation.LogSoftmax:
                return CNTKLib.LogSoftmax(fullyConnected);
            case Activation.ReLU:
                return CNTKLib.ReLU(fullyConnected);
            case Activation.SELU:
                return CNTKLib.SELU(fullyConnected);
            case Activation.Sigmoid:
                return CNTKLib.Sigmoid(fullyConnected);
            case Activation.Softmax:
                return CNTKLib.Softmax(fullyConnected);
            case Activation.Tanh:
                return CNTKLib.Tanh(fullyConnected);
        }

        return fullyConnected;
    }

    /// <summary>
    /// The Embedding Layer.
    /// </summary>
    /// <param name="input"></param>
    /// <param name="embeddingDim"></param>
    /// <param name="device"></param>
    /// <returns></returns>
    public static Function Embedding(Variable input, int embeddingDim, DeviceDescriptor device)
    {

        System.Diagnostics.Debug.Assert(input.Shape.Rank == 1);

        int inputDim = input.Shape[0];

        var embeddingParameters = new Parameter(new int[] { embeddingDim, inputDim }, DataType.Float, CNTKLib.GlorotUniformInitializer(), device);

        return CNTKLib.Times(embeddingParameters, input);
    }

    public static Function FullyConnectedLinearLayer(Variable input, int outputDim, DeviceDescriptor device, string outputName = "")
    {

        // Check the Rank:
        System.Diagnostics.Debug.Assert(input.Shape.Rank == 1);

        int inputDim = input.Shape[0];

        // Configure Dimensions:
        int[] s1 = { outputDim, inputDim };
        int[] s2 = { outputDim };

        // Configure the Initialiser:
        var Init = CNTKLib.GlorotUniformInitializer(CNTKLib.DefaultParamInitScale, CNTKLib.SentinelValueForInferParamInitRank, CNTKLib.SentinelValueForInferParamInitRank, 1);

        var W = new Parameter(s1, DataType.Float, Init, device, "w");
        var B = new Parameter(s2, 0.0f, device, "b");

        var times = CNTKLib.Times(W, input, "times");
        var plus = CNTKLib.Plus(B, times, outputName);

        // Return Input x Weights + Bias:
        return CNTKLib.Softmax(plus);
    }

    public static Function Label(string name)
    {
        return Variable.InputVariable(new int[] { NDShape.InferredDimension }, DataType.Float, name);
    }

    public static Function LSTMLayer(Variable input, int numOutputClasses, int embeddingDim, int hiddenDim, int cellDim, DeviceDescriptor device, string outputName)
    {

        // Example from: 
        // https://github.com/Microsoft/CNTK/blob/master/Examples/TrainingCSharp/Common/LSTMSequenceClassifier.cs
        //const int inputDim = 2000;
        //const int cellDim = 25;
        //const int hiddenDim = 25;
        //const int embeddingDim = 50;
        //const int numOutputClasses = 5;
        // var classifierOutput = LSTMSequenceClassifierNet(features, numOutputClasses, embeddingDim, hiddenDim, cellDim, device, "classifierOutput");

        return LSTM.LSTMSequenceClassifierNet(input, numOutputClasses, embeddingDim, hiddenDim, cellDim, device, outputName);
    }

    public static Function Placeholder(string name)
    {
        return Variable.PlaceholderVariable(NDShape.Unknown(), new List<Axis>(){ Axis.NewUniqueDynamicAxis("dynamicAxis", true) } );
    }
}

internal class LSTM
{

    static Function Stabilize<ElementType>(Variable x, DeviceDescriptor device)
    {

        bool isFloatType = typeof(ElementType).Equals(typeof(float));

        Constant f, fInv;

        if (isFloatType)
        {

            f = Constant.Scalar(4.0f, device);
            fInv = Constant.Scalar(f.DataType, 1.0 / 4.0f);
        }
        else
        {

            f = Constant.Scalar(4.0, device);
            fInv = Constant.Scalar(f.DataType, 1.0 / 4.0f);
        }

        var beta = CNTKLib.ElementTimes(fInv, CNTKLib.Log(Constant.Scalar(f.DataType, 1.0) + CNTKLib.Exp(CNTKLib.ElementTimes(f, new Parameter(new NDShape(), f.DataType, 0.99537863 /* 1/f*ln (e^f-1) */, device)))));

        return CNTKLib.ElementTimes(beta, x);
    }

    static Tuple<Function, Function> LSTMPCellWithSelfStabilization<ElementType>(Variable input, Variable prevOutput, Variable prevCellState, DeviceDescriptor device)
    {

        int outputDim = prevOutput.Shape[0];
        int cellDim = prevCellState.Shape[0];

        bool isFloatType = typeof(ElementType).Equals(typeof(float));
        DataType dataType = isFloatType ? DataType.Float : DataType.Double;

        Func<int, Parameter> createBiasParam;
        if (isFloatType)
            createBiasParam = (dim) => new Parameter(new int[] { dim }, 0.01f, device, "");
        else
            createBiasParam = (dim) => new Parameter(new int[] { dim }, 0.01, device, "");

        uint seed2 = 1;
        Func<int, Parameter> createProjectionParam = (oDim) => new Parameter(new int[] { oDim, NDShape.InferredDimension }, dataType, CNTKLib.GlorotUniformInitializer(1.0, 1, 0, seed2++), device);

        Func<int, Parameter> createDiagWeightParam = (dim) => new Parameter(new int[] { dim }, dataType, CNTKLib.GlorotUniformInitializer(1.0, 1, 0, seed2++), device);

        Function stabilizedPrevOutput = Stabilize<ElementType>(prevOutput, device);
        Function stabilizedPrevCellState = Stabilize<ElementType>(prevCellState, device);

        Func<Variable> projectInput = () => createBiasParam(cellDim) + (createProjectionParam(cellDim) * input);

        // Input gate
        Function it = CNTKLib.Sigmoid((Variable)(projectInput() + (createProjectionParam(cellDim) * stabilizedPrevOutput)) + CNTKLib.ElementTimes(createDiagWeightParam(cellDim), stabilizedPrevCellState));
        Function bit = CNTKLib.ElementTimes(it, CNTKLib.Tanh(projectInput() + (createProjectionParam(cellDim) * stabilizedPrevOutput)));

        // Forget-me-not gate
        Function ft = CNTKLib.Sigmoid((Variable)(projectInput() + (createProjectionParam(cellDim) * stabilizedPrevOutput)) + CNTKLib.ElementTimes(createDiagWeightParam(cellDim), stabilizedPrevCellState));
        Function bft = CNTKLib.ElementTimes(ft, prevCellState);

        Function ct = (Variable)bft + bit;

        // Output gate
        Function ot = CNTKLib.Sigmoid((Variable)(projectInput() + (createProjectionParam(cellDim) * stabilizedPrevOutput)) + CNTKLib.ElementTimes(createDiagWeightParam(cellDim), Stabilize<ElementType>(ct, device)));
        Function ht = CNTKLib.ElementTimes(ot, CNTKLib.Tanh(ct));

        Function c = ct;
        Function h = (outputDim != cellDim) ? (createProjectionParam(outputDim) * Stabilize<ElementType>(ht, device)) : ht;

        return new Tuple<Function, Function>(h, c);
    }

    static Tuple<Function, Function> LSTMPComponentWithSelfStabilization<ElementType>(Variable input, NDShape outputShape, NDShape cellShape, Func<Variable, Function> recurrenceHookH, Func<Variable, Function> recurrenceHookC, DeviceDescriptor device)
    {

        var dh = Variable.PlaceholderVariable(outputShape, input.DynamicAxes);
        var dc = Variable.PlaceholderVariable(cellShape, input.DynamicAxes);

        var LSTMCell = LSTMPCellWithSelfStabilization<ElementType>(input, dh, dc, device);
        var actualDh = recurrenceHookH(LSTMCell.Item1);
        var actualDc = recurrenceHookC(LSTMCell.Item2);

        // Form the recurrence loop by replacing the dh and dc placeholders with the actualDh and actualDc
        (LSTMCell.Item1).ReplacePlaceholders(new Dictionary<Variable, Variable> { { dh, actualDh }, { dc, actualDc } });

        return new Tuple<Function, Function>(LSTMCell.Item1, LSTMCell.Item2);
    }

    private static Function Embedding(Variable input, int embeddingDim, DeviceDescriptor device)
    {

        System.Diagnostics.Debug.Assert(input.Shape.Rank == 1);

        int inputDim = input.Shape[0];
        var embeddingParameters = new Parameter(new int[] { embeddingDim, inputDim }, DataType.Float, CNTKLib.GlorotUniformInitializer(), device);

        return CNTKLib.Times(embeddingParameters, input);
    }

    /// <summary>
    /// Build a one direction recurrent neural network (RNN) with long-short-term-memory (LSTM) cells.
    /// http://colah.github.io/posts/2015-08-Understanding-LSTMs/
    /// </summary>
    /// <param name="input">the input variable</param>
    /// <param name="numOutputClasses">number of output classes</param>
    /// <param name="embeddingDim">dimension of the embedding layer</param>
    /// <param name="LSTMDim">LSTM output dimension</param>
    /// <param name="cellDim">cell dimension</param>
    /// <param name="device">CPU or GPU device to run the model</param>
    /// <param name="outputName">name of the model output</param>
    /// <returns>the RNN model</returns>
    internal static Function LSTMSequenceClassifierNet(Variable input, int numOutputClasses, int embeddingDim, int LSTMDim, int cellDim, DeviceDescriptor device, string outputName)
    {

        Function embeddingFunction = Embedding(input, embeddingDim, device);
        Func<Variable, Function> pastValueRecurrenceHook = (x) => CNTKLib.PastValue(x);
        Function LSTMFunction = LSTMPComponentWithSelfStabilization<float>(embeddingFunction, new int[] { LSTMDim }, new int[] { cellDim }, pastValueRecurrenceHook, pastValueRecurrenceHook, device).Item1;
        // Function thoughtVectorFunction = CNTKLib.SequenceLast(LSTMFunction);

        return Layers.FullyConnectedLinearLayer(LSTMFunction, numOutputClasses, device, outputName);
    }
}

I can train fine, or at-least it appears to be training fine. My problem is, I have learnt nothing! My output at evaluation is completely wrong.

I suspect the problem is here:

Variable InputFeatures = Variable.InputVariable(new int[] { inputDim }, DataType.Float, InputFeaturesName, null, true/*isSparse*/);

Setting IsSparse to true, as I should, I get an Error:

 System.ArgumentOutOfRangeException: 'Dense input data supplied for sparse input Variable 'Input('InputFeatures', [943], [*, #])'.

Please help...

P.S: Awesome product, Thank You! BIG LEARNING CURVE THOUGH! Difficult to use.

fwaris commented 5 years ago

I have created a sample of something similar using an F# wrapper over the .Net API. The F# wrapper mimics the Python API. Hopefully you can glean something from it.

https://github.com/fwaris/FsCNTK/blob/master/FsCNTK/Scripts/cntk_202_lang_understanding.fsx

GuntaButya commented 5 years ago

@fwaris - Very nice! Thank you for sharing!

@All Readers: After several days of testing, I am convinced there is a problem with the C# API Implementation of CNTK, or the Examples provided for C#.

I have taken the exact example: LSTMSequenceClassifier and trained it on the same training data provided.

Note: The Training Data does not contain Features or Labels Data. E.G: No corresponding HotVector Data so there is no way to know what the Hot Vectors represent...

The Baseline was set, trained and implemented with the Example provided.

Now I changed the Dataset to the ATIS Dataset, and changed the corresponding Features and Labels Names and Streams, and loaded the Data for the Features and Labels.

The Model trained many times on the data and I basically got the same results. The Data was not learned properly.

Labels do not match the Output, its a mess. Nothing of any value is Learned. To be clear, the results do not match in any way the Python Example. I mean, programming language should not make any difference, right?

Python Example

LU Epoch 1020

I have even tried different Learners:

// Configure Learners:
IList<Learner> learners = new List<Learner>() 
{ 
    CNTKLib.FSAdaGradLearner(new ParameterVector(Model.Parameters().ToArray()), learningRate, momentum)
};

Something to note, is the Output blows out, I get all the same output Values after too many Epochs, all outputs become 128 = O - I can only assume, the model is Overfitted.

LU Epoch 4230

Again I believe there is a problem in the Sparse Tensors/Vectors into the Evaluation, or on the Input to the Model, in the Streams somewhere.

I am a Nube, learning CNTK, and I may be completely wrong, I may be doing something wrong, but I have followed to the letter the examples. I have some experience with ML, so in this area I am not a Nube.

I can only guess, Microsoft are no longer Supporting CNTK? The newer ML.NET Platform seems to be the new Baby? I really don't like it!

CNTK is WAY Better!

fwaris commented 5 years ago

@GuntaButya This is the output from the sample linked above (.Net API with sparse data):

[|("BOS", "O"); ("flights", "O"); ("from", "O"); ("new", "B-fromloc.city_name"); ("york", "I-fromloc.city_name"); ("to", "O"); ("seattle", "B-toloc.city_name"); ("EOS", "O")|]

So the API works but I agree that samples and documentation for .Net are lacking. To get around that I created a wrapper that mimics the Python API. Creating the wrapper was a painful process as I had to go through the Python source line-by-line in some cases (but it was a good learning experience).

Both Python and .Net API eventually create similar models (computational graphs) however the Python API is much better documented so its easier to follow.

F# is a 'functional' language first and hence translation of the Python API to F# was easier (than it would be to C#). For example, the model code for the above problem is simply the following:

let create_model() =
  let cell = L.LSTM(D hidden_dim,enable_self_stabilization=false)
  L.Embedding(D emb_dim, name="embed")
  >> L.Recurrence(cell, initial_states=[cVal;cVal], go_backwards=false) 
  >> O.getOutput 0
  >> L.Dense(D num_labels, name="classify")

Note that you can compose the layers with the built-in '>>' function composition operator to assemble a complete model.

F# learning resources are here: https://fsharp.org/

GuntaButya commented 5 years ago

With respect, learning F# is not going to resolve the C# Problems.

I think it is important to get to the bottom of this issue, so others can also benefit from it.

GuntaButya commented 5 years ago

@msftdata, @frankseide,

I really would like a little help on this one.

Please?

elevir commented 5 years ago

@GuntaButya hello! I think the problem is that Value.CreateBatch creates Dense NDArrayView, try explicitly create NDArrayView in sparse mode. Also Value.CreateBatch is very very slow, I don't recommend to use it. To create sparse NDArrayView use the following ctor:

public NDArrayView(NDShape viewShape, int[] colStarts, int[] rowIndices, double[] nonZeroValues, DeviceDescriptor device, bool readOnly = false)

The example for passing data to Value over NDArrayView you can see in #3384

P.S. you can simplify your task by using CNTKReaders and if you need, you can see to CNTKBinaryWriter which support both dense and sparse data writing.

GuntaButya commented 5 years ago

@elevir - Thank You!

With respect, and please forgive my freshness to CNTK, but I do not implement any: Value.CreateBatch instances. If you wouldn't mind explaining a little more what you meant and where I should implement your suggestions?

Thank You!

GuntaButya commented 5 years ago

After a little reviewing of the model, Axis and NDShapes, I now get this Error:

System.ApplicationException: 'TensorOp: Tensor operations are currently not supported for sparse matrices.

Starting to think CNTK ( 2.6 ) with C# is a bad choice...

@Microsoft - perhaps you could come to the rescue?

elevir commented 5 years ago

@GuntaButya

I do not implement any: Value.CreateBatch instances

You are using it in the following code snippet:

void EvaluateModel(Function model, string text) {

    // Create the Input Value:
    Value InputValue = Value.CreateBatch(model.Arguments.Single().Shape, GetSparse(text), Device);

And I supposed that this exception:

System.ArgumentOutOfRangeException: 'Dense input data supplied for sparse input Variable 'Input('InputFeatures', [943], [*, #])'.

arised because of Value.CreateBatch.

About

System.ApplicationException: 'TensorOp: Tensor operations are currently not supported for sparse matrices.

Look at #3115. Seems like Embedding leaving data in sparse format, therefore it leads to error.

fwaris commented 5 years ago

@GuntaButya

Starting to think CNTK ( 2.6 ) with C# is a bad choice...

Deep Learning fits naturally with functional languages. At the end of the day you are constructing a computation graph where the components of the graph are composable functions. Function composition is the bread-and-butter of functional languages.

Yes you can wire up the graph with an OO language but the task is harder as you lose the power of abstraction.

Python is the default language for DL - now and at least for the near future. However, while Python is flexible its lacks type safety (I always feel uneasy using Python).

And so you are starting to see Tensorflow with strongly-typed functional languages e.g. Swift, Scala and F# (not an exhaustive list).

PyTorch is a very interesting framework. Alas the PyTorch creators have deeply intertwined Python and C++ and so its really only usable with Python for now. Maybe some brave souls can create APIs in strongly-typed functional languages as well.

GuntaButya commented 5 years ago

@elevir - You're right, I am sorry I over looked the obvious.

I always find struggling with problems like this is a great Learning Curve. Because of the nature of CNTK, there is a lot to learn and help like yours is greatly appreciated, so Thank You!

I will implement your suggestion today and let you know how it went.

GuntaButya commented 5 years ago

@GuntaButya hello! I think the problem is that Value.CreateBatch creates Dense NDArrayView, try explicitly create NDArrayView in sparse mode. Also Value.CreateBatch is very very slow, I don't recommend to use it. To create sparse NDArrayView use the following ctor: public NDArrayView(NDShape viewShape, int[] colStarts, int[] rowIndices, double[] nonZeroValues, DeviceDescriptor device, bool readOnly = false)

The example for passing data to Value over NDArrayView you can see in #3384 P.S. you can simplify your task by using CNTKReaders and if you need, you can see to CNTKBinaryWriter which support both dense and sparse data writing.

I tried, I implemented:

NDShape inputShape = model.Arguments.Single().Shape;
NDArrayView sequence = new NDArrayView(inputShape, GetSparse(word).ToArray(), DeviceDescriptor.CPUDevice);
Value InputValue = Value.Create(inputShape, new[] { sequence }, new bool[] { }, DeviceDescriptor.UseDefaultDevice(), false, false);

// Create Input Dictionary Pair:
Dictionary<Variable, Value> ModelInput = new Dictionary<Variable, Value>
{
    { model.Arguments.Single(), InputValue }
};

Early testing show the same results:

Test 01 - With Code implimented

Without Code Implemented:

Test 01 - With Code NOT implimented

Thank You, and please note: the code: .AppendShape(new[] {<sequenceLengthInSamples>}) I could not implement due to a mismatch is dimensions.

GuntaButya commented 5 years ago

Since I started this, I see there is a lot of problems with CNTK.

The API for .NET may work for F#, but the C# dev API is almost unusable for Sparse Data, which is a real shame! I see C# as being just as simple as any other language, python, F#, any other language, and I also see support is zero!

Microsoft do not want C# users to have access to CNTK, that's as clear as I see it! Why? Are they worried about super Users taking over the world?

Very disappointed! I have done so much work to only get to a point where I see to many issues to go further. CNTK 2.7 don't support older GPU's also.

GuntaButya commented 5 years ago

Special Note to C# DEV's

NDArrayView xSequence = new NDArrayView(new NDShape(0, 1).AppendShape(new[] { inputDim }), set.Features.ToArray(), DeviceDescriptor.CPUDevice);
Value xValues = new Value(xSequence);

NDArrayView ySequence = new NDArrayView(new NDShape(0, 1).AppendShape(new[] { numOutputClasses }), set.Labels.ToArray(), DeviceDescriptor.CPUDevice);
Value yValues = new Value(ySequence);

Method is a result of: @elevir help, thanks!

Where Set is just a Class, with float[]'s for Features and Labels.

Is a much faster, and a much simpler way to get your Data to look like its Sparse even though its not marked as Sparse.

IMPORTANTLY: The Dimension are correct!

Does the network Train? Says it is but the result is very poor!

elevir commented 5 years ago

@GuntaButya you are welcome! :) I think passing shape into NDArrayView depends on model, otherwise I don't understand why in my case I had to represent shape as <shapeOfSample>.AppendShape(new[] {<sequenceLengthInSamples>}). But you do not load any row-major model, therefore there must not be any difference. Well, it is very strange :) Anyway order of the axes must be WHC in C++/C# and dynamic axes must be placed behind static axes, unlike python cntk.

GuntaButya commented 5 years ago

After a long struggle, much frustration, with ZERO supportive Documentation, I finally got this working:

// The Model's Input Dimensions:
int inputDim = 38912;
int[] Input = new int[] { 7 }; //** = 7:1 The Index of the Input Array Vector or Tensor or what ever you want to call it.

// The Public Evaluate Method:
Variable XVariable = model.Arguments.Single();

// The Sparse Array is parsed into the Value:
Value XValue = Value.CreateBatch<float>(inputDim, Input, DeviceDescriptor.CPUDevice, false);

// Create Input Dictionary Pair:
Dictionary<Variable, Value> ModelInput = new Dictionary<Variable, Value>
{
        { XVariable, XValue }
};

// Vector the Model's Output Variable.
Variable OutputVariable = model.Output;

 // Create Output Dictionary Pair:
 Dictionary<Variable, Value> ModelOutput = new Dictionary<Variable, Value>
 {
        { OutputVariable, null }
};

// Evaluate the Model using the Device:
model.Evaluate(ModelInput, ModelOutput, device);

And now, C# does work with CNTK and Sparse Input Vectors!

CNTK is so COOL, it just needs a C# Dev's touch in the next version. Slip stream the basic things!

Thank You!