cntk c# evaluating about fast-rcnn model

wendy7707 commented 7 years ago

I have already done Making the model part, which also generates ROIs during A1. The only part that I'm still struggling is how to evaluate now that I have a trained model and the ROIs for an image.

Here's the C# code I'm using based on @pkranen branch, which doesn't actually give me any ROI labels besides 0 (ground). So I'm wondering if there's a mistake in the code, or at least an alternative way to do the evaluation for ROI labeling.

public static void EvaluateObjectDetectionModel() { try { // This example requires the Fast-RCNN_grocery100 model. // The model can be downloaded from // The model is assumed to be located at: \Examples\Image\PretrainedModels\ // It further requires the grocery image data set. // Please run 'python install_fastrcnn.py' from \Examples\Image\Detection\FastRCNN to get the data. string imageDirectory = Path.Combine(initialDirectory, @"....\Examples\Image\DataSets\grocery\testImages"); string modelDirectory = Path.Combine(initialDirectory, @"....\Examples\Image\PretrainedModels"); Environment.CurrentDirectory = initialDirectory;

            List<float> outputs;

            using (var model = new IEvaluateModelManagedF())
            {
                string modelFilePath = Path.Combine(modelDirectory, "Fast-RCNN_grocery100.model");
                ThrowIfFileNotExist(modelFilePath,
                    string.Format("Error: The model '{0}' does not exist. Please download the model from https://www.cntk.ai/Models/FRCN_Grocery/Fast-RCNN_grocery100.model " +
                                  "and save it under ..\\..\\Examples\\Image\\PretrainedModels.", modelFilePath));

                model.CreateNetwork(string.Format("modelPath=\"{0}\"", modelFilePath), deviceId: -1);

                // Prepare input value in the appropriate structure and size
                var inDims = model.GetNodeDimensions(NodeGroup.Input);
                if (inDims.First().Value != 1000 * 1000 * 3)
                {
                    throw new CNTKRuntimeException(string.Format("The input dimension for {0} is {1} which is not the expected size of {2}.", inDims.First(), inDims.First().Value, 1000 * 1000 * 3), string.Empty);
                }

                // Transform the image
                string imageFileName = Path.Combine(imageDirectory, "WIN_20160803_11_28_42_Pro.jpg");
                ThrowIfFileNotExist(imageFileName, string.Format("Error: The test image file '{0}' does not exist.", imageFileName));

                Bitmap bmp = new Bitmap(Bitmap.FromFile(imageFileName));
                // TODO: preserve aspect ratio while scaling and pad the remaining pixels with (114, 114, 114)
                var resized = bmp.Resize(1000, 1000, true);
                var resizedCHW = resized.ParallelExtractCHW();

                // TODO: generate ROI proposals using an external library, e.g. selective search, 
                // TODO: project them to the 1000 x 1000 image size and compute (x, y, w, h) relative to the image dimensions.
                // TODO: Alternative workaround: run script 'A1_GenerateInputROIs.py' from <cntkroot>\Examples\Image\Detection\FastRCNN and read rois from file.

                // parse rois: groups of 4 floats corresponding to (x, y, w, h) for an ROI
                string roiCoordinates = "0.219 0.0 0.165 0.29 0.329 0.025 0.07 0.115 0.364 0.0 0.21 0.13 0.484 0.0 0.075 0.06 0.354 0.045 0.055 0.09 0.359 0.075 0.095 0.07 0.434 0.155 0.04 0.085 0.459 0.165 0.145 0.08 0.404 0.12 0.055 0.06 0.714 0.235 0.06 0.12 0.659 0.31 0.065 0.075 0.299 0.16 0.1 0.07 0.449 0.18 0.19 0.15 0.284 0.21 0.135 0.115 0.254 0.205 0.07 0.055 0.234 0.225 0.075 0.095 0.239 0.23 0.07 0.085 0.529 0.235 0.075 0.13 0.229 0.24 0.09 0.085 0.604 0.285 0.12 0.105 0.514 0.335 0.1 0.045 0.519 0.335 0.08 0.045 0.654 0.205 0.08 0.055 0.614 0.215 0.115 0.065 0.609 0.205 0.115 0.075 0.604 0.225 0.115 0.055 0.524 0.23 0.06 0.095 0.219 0.315 0.065 0.075 0.629 0.31 0.095 0.08 0.639 0.325 0.085 0.06 0.219 0.41 0.25 0.11 0.354 0.46 0.185 0.11 0.439 0.515 0.09 0.075 0.359 0.455 0.175 0.125 0.449 0.525 0.08 0.07 0.574 0.46 0.06 0.105 0.579 0.46 0.105 0.1 0.529 0.47 0.15 0.145 0.584 0.475 0.085 0.09 0.354 0.52 0.08 0.06 0.219 0.52 0.115 0.1 0.229 0.53 0.1 0.08 0.229 0.575 0.105 0.045 0.339 0.56 0.085 0.045 0.354 0.535 0.075 0.06 0.299 0.59 0.145 0.05 0.304 0.58 0.12 0.045 0.594 0.555 0.075 0.05 0.534 0.58 0.14 0.06 0.504 0.66 0.07 0.06 0.494 0.73 0.075 0.09 0.504 0.695 0.07 0.095 0.219 0.665 0.075 0.145 0.494 0.755 0.085 0.075 0.704 0.665 0.07 0.21 0.434 0.72 0.055 0.1 0.569 0.695 0.205 0.185 0.219 0.73 0.29 0.13 0.574 0.665 0.08 0.055 0.634 0.665 0.095 0.045 0.499 0.725 0.08 0.135 0.314 0.71 0.155 0.065 0.264 0.72 0.19 0.105 0.264 0.725 0.185 0.095 0.249 0.725 0.12 0.11 0.379 0.77 0.08 0.055 0.509 0.785 0.055 0.06 0.644 0.875 0.13 0.085 0.664 0.875 0.11 0.075 0.329 0.025 0.08 0.115 0.639 0.235 0.135 0.15 0.354 0.46 0.185 0.12 0.354 0.46 0.185 0.135 0.229 0.225 0.08 0.095 0.219 0.72 0.29 0.14 0.569 0.67 0.205 0.21 0.219 0.315 0.1 0.075 0.219 0.23 0.09 0.085 0.219 0.41 0.295 0.11 0.219 0.665 0.27 0.145 0.219 0.225 0.09 0.14 0.294 0.665 0.2 0.05 0.579 0.46 0.105 0.145 0.549 0.46 0.14 0.145 0.219 0.41 0.295 0.125 0.219 0.59 0.11 0.05 0.639 0.235 0.135 0.155 0.629 0.235 0.145 0.155 0.314 0.71 0.155 0.115 0.334 0.56 0.09 0.045 0.264 0.72 0.225 0.1 0.264 0.72 0.225 0.105 0.219 0.71 0.29 0.15 0.249 0.725 0.125 0.11 0.219 0.665 0.27 0.17 0.494 0.73 0.075 0.115 0.494 0.73 0.085 0.115 0.219 0.0 0.14 0.14 0.219 0.07 0.14 0.14 0.219 0.14 0.14 0.14";
                var rois = roiCoordinates.Split(' ').Select(x => float.Parse(x)).ToList();

                // inputs are the image itself and the ROI coordinates
                var inputs = new Dictionary<string, List<float>>() { { inDims.First().Key, resizedCHW }, { inDims.Last().Key, rois } };

                // We can call the evaluate method and get back the results (predictions per ROI and per class (no softmax applied yet!)...
                var outDims = model.GetNodeDimensions(NodeGroup.Output);
                outputs = model.Evaluate(inputs, outDims.First().Key);
            }

            // the object classes used in the grocery example
            var labels = new[] {"__background__",  
               "avocado", "orange", "butter", "champagne", "eggBox", "gerkin", "joghurt", "ketchup",
               "orangeJuice", "onion", "pepper", "tomato", "water", "milk", "tabasco", "mustard"};
            int numLabels = labels.Length;
            int numRois = outputs.Count / numLabels;

            Console.WriteLine("Only showing predictions for non-background ROIs...");
            int numBackgroundRois = 0;
            for (int i = 0; i < numRois; i++)
            {
                var outputForRoi = outputs.Skip(i * numLabels).Take(numLabels).ToList();

                // Retrieve the predicted label as the argmax over all predictions for the current ROI
                var max = outputForRoi.Select((value, index) => new { Value = value, Index = index })
                    .Aggregate((a, b) => (a.Value > b.Value) ? a : b)
                    .Index;

                if (max > 0)
                {
                    Console.WriteLine("Outcome for ROI {0}: {1} \t({2})", i, max, labels[max]);
                }
                else
                {
                    numBackgroundRois++;
                }
            }

            Console.WriteLine("Number of background ROIs: {0}", numBackgroundRois);
        }
        catch (CNTKException ex)
        {
            OnCNTKException(ex);
        }
        catch (Exception ex)
        {
            OnGeneralException(ex);
        }
    }

mortengryning commented 7 years ago

I think you are using the old ROI coordinate system (prior to CNTK 2.1). Try to specify the ROI like 219 0 165 29 etc. - that is, not like 0.XX but just XX (X1 Y1 X2 Y2). I had the same problem as you and solved it that way :-)

pkranen commented 7 years ago

Correct, roi pooling now takes absolute pixel coordinates (w.r.t. to the input image that is given to the network, i.e. if you apply scaling or padding you need to apply this to the roi candidates as well). The expected coordinates are (x_min, y_min, x_max, y_max)

wendy7707 commented 7 years ago

@mortengryning sorry ,one more questions, 0.07 converts to 7?

pkranen commented 7 years ago

Hi Wendy. Previously the coordinates for CNTK Fast R-CNN were (x, y, w, h) all relative w.r.t. the input image dimension (see the tutorial). Now we use the same type of coordinates as other toolkits, e.g. Caffe. The are in absolute pixel coordinates w.r.t. the input image. Example: if your input image is scaled to 800x800 and your ROI coordinates were (0.1, 0.3, 0.2, 0.5) then the new coordinates are (0.1 800, 0.3 800, (01. + 0.2) 800, (0.3 + 0.5) 800) = (80, 240, 240, 640).

wendy7707 commented 7 years ago

@pkranen thank you so much! I just follow your suggestion, and caculate like the following: 0.219 0.0 0.165 0.29 =(219,0,384,290), but the results is not very good, just output two rois have target, it is not the same good results as the test.z result by python. Would you please tell me why? thanks!

wendy7707 commented 7 years ago

@mortengryning @pkranen thank you so much! I just follow your suggestion, and caculate like the following: 0.219 0.0 0.165 0.29 =(219,0,384,290), but the results is not good, I think after"outputs = model.Evaluate(inputs, outDims.First().Key);" the output should be same as "test.z" in python. But it is different, and the results are not same as too. would you please tell me how to solve this problem, did you have the same result? Thanks!

microsoft / CNTK

cntk c# evaluating about fast-rcnn model #2269