microsoft / CNTK

Microsoft Cognitive Toolkit (CNTK), an open source deep-learning toolkit
https://docs.microsoft.com/cognitive-toolkit/
Other
17.51k stars 4.28k forks source link

Evaluation of LSTMSequenceClassifier #3794

Closed MihaiHoriaPopescu closed 4 years ago

MihaiHoriaPopescu commented 4 years ago

Hello, I have some problems on the evaluation of my model. In the example of the "LSTM Sequence Classifier" there is no example of how to evaluate the model, or the way I can feed the model.Evaluate with a full sequence for the classification. I have tried to implement a function of evaluation similar to the one shown in TestHelper, but the results are not the one expected, since, I get an evaluation on each line of the sequence and not on the full sequence. Can someone show me how I can use the model.Evaluate with a sequence, or to evaluate a file with multiple sequences?

`public static float ValidateModelWithMinibatchSource(string modelFile, MinibatchSource testMinibatchSource, string featureInputName, string labelInputName, string outputName, DeviceDescriptor device, int maxCount = 1000) { Function model = Function.Load(modelFile, device); var featureInput = model.Arguments[0]; Console.WriteLine(featureInput);

        var labelOutput = model.Outputs.Single(o => o.Name == outputName);

        var featureStreamInfo = testMinibatchSource.StreamInfo(featureInputName);
        var labelStreamInfo = testMinibatchSource.StreamInfo(labelInputName);

        uint batchSize = LSTMSequenceClassifier.minibatchSize;
        int miscountTotal = 0, totalCount = 0;
        while (true)
        {
            var minibatchData = testMinibatchSource.GetNextMinibatch(batchSize, device);
            if (minibatchData == null || minibatchData.Count == 0)
                break;
            totalCount += (int)minibatchData[featureStreamInfo].numberOfSamples;

            // expected labels are in the minibatch data.
            var labelData = minibatchData[labelStreamInfo].data.GetDenseData<float>(labelOutput);
            var expectedLabels = labelData.Select(l => l.IndexOf(l.Max())).ToList();

            var inputDataMap = new Dictionary<Variable, Value>() {
                { featureInput, minibatchData[featureStreamInfo].data }
            };

            var outputDataMap = new Dictionary<Variable, Value>() {
                { labelOutput, null }
            };

            model.Evaluate(inputDataMap, outputDataMap, device);
            var outputData = outputDataMap[labelOutput].GetDenseData<float>(labelOutput);
            var actualLabels = outputData.Select(l => l.IndexOf(l.Max())).ToList();

            int misMatches = actualLabels.Zip(expectedLabels, (a, b) => a.Equals(b) ? 0 : 1).Sum();

            miscountTotal += misMatches;
            Console.WriteLine($"Validating Model: Total Samples = {totalCount}, Misclassify Count = {miscountTotal}");

            if (totalCount > maxCount)
                break;
        }

        float errorRate = 1.0F * miscountTotal / totalCount;
        Console.WriteLine($"Model Validation Error = {errorRate}");
        return errorRate;
    }`
MihaiHoriaPopescu commented 4 years ago

The function actually works, the problem was that I was using the label as dense and I was classifying a continuous value, where I was trying to classify nominal values. Changing the format of the label is working properly.