microsoft / CNTK

Microsoft Cognitive Toolkit (CNTK), an open source deep-learning toolkit
https://docs.microsoft.com/cognitive-toolkit/
Other
17.49k stars 4.3k forks source link

development of a neural network for object search #3879

Open Worldmasters opened 1 year ago

Worldmasters commented 1 year ago

Hello. There was a task to build a neural network to search for circles and ellipses in the image. The target object will also be black. It is clear that there are very few features, but still.

I do it in C# because then the network needs to be integrated into an existing project. I used Tensorflow and CNTK libs. but since the tensor is terribly slow both when building networks and when calculating them, I took CNTK. There's probably not a whole lot of difference.

And so. The most important question. How to configure the network to work correctly?? Generated a bunch of pictures with different ellipses and circles. I also generated the answers in the form of a roundedRect structure that contains the coordinates of the center of the object X,Y, the length and width of the rectangle of the object W,H and the angle of rotation Angle. A total of 5 values.

When I feed all 5 values to the network, I bring them to values from 0 to 1 by dividing by the width and height of the frame, respectively. I divide the angle by 360.

`int labels_count = 5; NDShape inputDim = NDShape.CreateNDShape(new int[] { 320, 240, 1 }); // подается массив для входного изображения в ЧБ NDShape outputDim = NDShape.CreateNDShape(new int[] { labels_count }); // выходной массив параметров прямоугольник объекта X Y W H

        // входной слой данных
        Variable input_shape = CNTKLib.InputVariable(inputDim, DataType.Float, "features");
        Variable output_shape = CNTKLib.InputVariable(outputDim, DataType.Float, "labels");
        // создаем слои

        double convWScale = 0.26;

        var view = new NDArrayView(NDShape.CreateNDShape(new int[] { 3, 3, 1 }), new double[] { -1, 0, 1, -2, 0, 2, -1, 0, 1 }, DeviceDescriptor.CPUDevice, false);

        var scaledInput = CNTKLib.ElementTimes(Constant.Scalar<float>(1.0f / 255.0f, DeviceDescriptor.CPUDevice), input_shape); // слой масштабирования 

        int kernelWidth1 = 5, kernelHeight1 = 5, numInputChannels1 = 1, outFeatureMapCount1 = 4;
        var conv1 = CNTKHelper.ConvolutionWithMaxPooling(scaledInput, DeviceDescriptor.CPUDevice, kernelWidth1, kernelHeight1, numInputChannels1, outFeatureMapCount1, 2, 2, 3, 3);

        var layer2 = CNTKHelper.Dense(conv1, 64, DeviceDescriptor.CPUDevice, CNTKHelper.Activation.ReLU, "layer2"); 
        var classifierOutput = CNTKHelper.Dense(layer2, labels_count, DeviceDescriptor.CPUDevice, CNTKHelper.Activation.Sigmoid, "classifierOutput"); // конечная сеть`

389

356

359

363

I was in training for a day, but the value of PreviousMinibatchLossAverage does not fall below 30. Maybe who knows how to choose the layers correctly?? Is it possible to solve this problem in some way?