onnx / models

A collection of pre-trained, state-of-the-art models in the ONNX format
http://onnx.ai/models/
Apache License 2.0
7.9k stars 1.4k forks source link

Help tiny yolo v2 #234

Open SiR0N opened 4 years ago

SiR0N commented 4 years ago

Hello

I want to use the onnx tiny yolo v2 in Android https://github.com/onnx/models/tree/master/vision/object_detection_segmentation/tiny_yolov2

My implementation is based on this one: https://github.com/tensorflow/tensorflow/blob/master/tensorflow/examples/android/src/org/tensorflow/demo/TensorFlowYoloDetector.java

I made it work but I always get (almost) the same output with different inputs (I use the VOC dataset):

[20975] tvmonitor (24.3%) RectF(0.0, 0.0, 415.0, 415.0)
[6600] horse (23.7%) RectF(0.0, 0.0, 401.00543, 399.01813)
[21075] bicycle (22.9%) RectF(0.0, 0.0, 415.0, 415.0)
[14550] aeroplane (21.4%) RectF(0.0, 0.0, 415.0, 415.0)

That's the output of this picture: 009667

I am not sure where is the problem, if I do the pre/post-processing wrong or the model is not correct. Because I tried to use the model from the original website with the same images and it works nice.

This is my preprocessing:

public float[] getFloatArrayFromResizedImage(Bitmap bm,int reqCH, int reqW, int reqH ) {
        int w = bm.getWidth();
        int h = bm.getHeight();
        if (reqW != w || reqH != h  ){

            bm = resizeImage(bm, reqW, reqH);

            w = bm.getWidth();
            h = bm.getHeight();
        }

        float[] data =  new float[reqCH * w * h];
        int[] intValues =  new int[w * h];
        bm.getPixels(intValues, 0, bm.getWidth(), 0, 0, bm.getWidth(), bm.getHeight());

        for (int i = 0; i < intValues.length; ++i) {
            float r = ((intValues[i] >> 16) & 0xFF) / 255.0f;
            float g = ((intValues[i] >> 8) & 0xFF) / 255.0f;
            float b =  (intValues[i] & 0xFF) / 255.0f;

            if(reqCH == 1){
                int gray = (int) (r * 0.3 + g * 0.59 + b * 0.11);

                    data[i] = gray;

                } else {
                    data[i * 3 + 0] = r;
                    data[i * 3 + 1] = g;
                    data[i * 3 + 2] = b;
                }
        }
return data;
    }

I know that the input format is NCHW, in this case, 1x3x416x416, I just wonder what it means, should I feed the model with an 1D array of size 3x416x416 in this format [R,G,B,R,G,B,R,G,B....] (I use this right now) or this one: [R,R,R......, G,G,G.....B,B,B]??

Post-processing:

public List<Classifier.Recognition> postProcess(float[] output1) {

        int MAX_RESULTS = 5;

         int NUM_CLASSES = 20;

         int NUM_BOXES_PER_BLOCK = 5;
        final float[] output = output1;

           double[] ANCHORS = {
                1.08, 1.19,
                3.42, 4.41,
                6.63, 11.38,
                9.42, 5.11,
                16.62, 10.52
        };

           String[] LABELS = {
                "aeroplane",
                "bicycle",
                "bird",
                "boat",
                "bottle",
                "bus",
                "car",
                "cat",
                "chair",
                "cow",
                "diningtable",
                "dog",
                "horse",
                "motorbike",
                "person",
                "pottedplant",
                "sheep",
                "sofa",
                "train",
                "tvmonitor"
        };

             int blockSize = 32;

        final int gridWidth = 416 / blockSize;
        final int gridHeight = 416 / blockSize;

       //https://github.com/tensorflow/tensorflow/blob/master/tensorflow/examples/android/src/org/tensorflow/demo/Classifier.java
        final PriorityQueue<Classifier.Recognition> pq =
                new PriorityQueue<Classifier.Recognition>(
                        1,
                        new Comparator<Classifier.Recognition>() {
                            @Override
                            public int compare(final Classifier.Recognition lhs, final Classifier.Recognition rhs) {
                                // Intentionally reversed to put high confidence at the head of the queue.
                                return Float.compare(rhs.getConfidence(), lhs.getConfidence());
                            }
                        });

        for (int y = 0; y < gridHeight; ++y) {
            for (int x = 0; x < gridWidth; ++x) {
                for (int b = 0; b < NUM_BOXES_PER_BLOCK; ++b) {
                    final int offset =
                            (gridWidth * (NUM_BOXES_PER_BLOCK * (NUM_CLASSES + 5))) * y
                                    + (NUM_BOXES_PER_BLOCK * (NUM_CLASSES + 5)) * x
                                    + (NUM_CLASSES + 5) * b;

                    final float xPos = (x + expit(output[offset + 0])) * blockSize;
                    final float yPos = (y + expit(output[offset + 1])) * blockSize;

                    final float w = (float) (Math.exp(output[offset + 2]) * ANCHORS[2 * b + 0]) * blockSize;
                    final float h = (float) (Math.exp(output[offset + 3]) * ANCHORS[2 * b + 1]) * blockSize;

                    final float confidence = expit(output[offset + 4]);

                    int detectedClass = -1;
                    float maxClass = 0;

                    final float[] classes = new float[NUM_CLASSES];
                    for (int c = 0; c < NUM_CLASSES; ++c) {
                        classes[c] = output[offset + 5 + c];
                    }
                    softmax(classes);

                    for (int c = 0; c < NUM_CLASSES; ++c) {
                        if (classes[c] > maxClass) {
                            detectedClass = c;
                            maxClass = classes[c];
                        }
                    }

                    final float confidenceInClass = maxClass * confidence;
                    if (confidenceInClass > 0.2) {
                        final RectF rect =
                                new RectF(
                                        Math.max(0, xPos - w / 2),
                                        Math.max(0, yPos - h / 2),
                                        Math.min(416 - 1, xPos + w / 2),
                                        Math.min(416 - 1, yPos + h / 2));
                        System.out.println(LABELS[detectedClass] +", "+ detectedClass +", "+  confidenceInClass+", "+ rect);

                        pq.add(new Classifier.Recognition("" + offset, LABELS[detectedClass], confidenceInClass, rect));
                    }
                }
            }
        }

        final ArrayList<Classifier.Recognition> recognitions = new ArrayList<Classifier.Recognition>();
        for (int i = 0; i < Math.min(pq.size(), MAX_RESULTS); ++i) {
            recognitions.add(pq.poll());
        }

        return recognitions;
    }

void softmax(final float[] vals) {
        float max = Float.NEGATIVE_INFINITY;
        for (final float val : vals) {
            max = Math.max(max, val);
        }
        float sum = 0.0f;
        for (int i = 0; i < vals.length; ++i) {
            vals[i] = (float) Math.exp(vals[i] - max);
            sum += vals[i];
        }
        for (int i = 0; i < vals.length; ++i) {
            vals[i] = vals[i] / sum;
        }
    }

    private float expit(final float x) {
        return (float) (1. / (1. + Math.exp(-x)));
    }

do I do anything wrong? it seems to me that there is something wrong in the model weights because no matter the input I get almost the same output.

prasanthpul commented 4 years ago

can you clarify what you are using to run the ONNX model?

SiR0N commented 4 years ago

Hi, sorry I forgot to mention it:

I use the C API of the runtime (with JNI) I used the mnist and fer models without problems

The flow is: 1)(Java) getFloatArrayFromResizedImage -> array 2) (Java) "send" array to C 3) (C) run the onnx model with array -> array2 4) (C) "send" array2 to java 5) (Java) postProcess(array2)

This is the input and output info of yolo I got by the C API:

INPUTs INFO: 
    Number of Inputs 1 
    Input 0 Name: image
    Input 0 : type = 1
    Input 0 : num_dims = 4
    Input 0 : dim 0 = 1  //if not input_node_dims[0] = 1 the value is -1 and I got wrong final size;
    Input 0 : dim 1 = 3
    Input 0 : dim 2 = 416
    Input 0 : dim 3 = 416
    INPUT TENSOR (0) Size = 519168
    TENSOR image (0) is a TENSOR
    ALL TENSORS HAVE BEEN CREATED, Let's RUN!!!!
    OUTPUTs INFO: 
    Number of Outputs = 1
    Output 0 Name: grid
    Output 0 : type = 1
    Output 0 : num_dims = 4
    Output 0 : dim 0 = 1  //if not output_node_dims[0] = 1 the value is -1 and I got wrong final size;
    Output 0 : dim 1 = 125
    Output 0 : dim 2 = 13
    Output 0 : dim 3 = 13
 OUTPUT TENSOR (0) Size: 21125
    RUN!!!!
OUTPUT TENSOR 0 is a TENSOR

How to get input/output sizes:

input_node_dims[0] = 1;

        size_t input_tensor_size = 1;
        for (size_t j = 0; j < num_dims; j++) {
            input_tensor_size = input_tensor_size * input_node_dims[j];
            __android_log_print(android_LogPriority::ANDROID_LOG_ERROR,
                                logid,
                                "Input %zu : dim %zu = %jd\n", i, j, input_node_dims[j]);

        }
//wrong size when: OrtGetTensorShapeElementCount(tensor_info, &input_tensor_size);

How to create the tensor:

OrtValue* input_tensor = nullptr;
        ost = OrtCreateTensorWithDataAsOrtValue(
                ort_info,
                img,//output of: getFloatArrayFromResizedImage
                input_tensor_size * sizeof(type),
                reinterpret_cast<const int64_t *>(input_node_dims.data()),
                input_node_dims.size(), //num_dim
                type,
                &input_tensor);

How to run :

std::vector<OrtValue*> ortOutput(num_output_nodes);
    ost = OrtRun( ort_ses, nullptr,
            input_node_names.data(),
            input_tensor_list.data()
            num_input_nodes,
            output_node_names.data(),
            num_output_nodes,
                  ortOutput.data());

//get values
float* data = nullptr; //I get the postProcess with this array, I "send" it to java
        ost = OrtGetTensorMutableData(ortOutput[i], reinterpret_cast<void **>(&data));
SiR0N commented 4 years ago

Hi, I checked the model on python and it seems for me that the model is wrong and it has no weights.

I did on python due that I was no sure if my post/preprocessing was right and I got the same results as I had with the previous code, therefore I think that something is wrong in the model.

I did more or less the same as here

but I use onnx runtime:

def inference(sess, preprocessed_image):
    input_name = sess.get_inputs()[0].name
    output_name = sess.get_outputs()[0].name
    predictions = sess.run([output_name], {input_name: preprocessed_image})
    return predictions

sess = rt.InferenceSession("yolo-Model.onnx")
predictions = inference(sess, preprocessed_image)

Can anyone check the model to see if it is right?

EmmaNingMS commented 4 years ago

@jiafatom could you help take a look?

jiafatom commented 4 years ago

Some info here: This model was converted from a Core ML version of Tiny YOLO, around 1.5 years ago.

prasanthpul commented 4 years ago

@jiafatom should we update it to be directly converted instead?

EmmaNingMS commented 4 years ago

@jiafatom should we update it to be directly converted instead?

Tiny YoloV3 has been added in the model zoo lately. @SiR0N could you try this one?

Should we always update a model with the latest version? Not sure the old version is still urged if the latest one is ready to use.

jiafatom commented 4 years ago

I just tested the tiny yolov2 onnx model (opset 8) with test data, and it works good for me. The model has weights, so is that possible there is something wrong with your script? In another issue, you also have similar problem for tinyyolov3, can you check my example there?

SiR0N commented 4 years ago

Hi @EmmaNingMS, @jiafatom

I tried the code that @jiafatom shared with me and it worked for me (yolov3) but I would like to use the tiny yolov2 as I have already the data processing done in JAVA (as I described before).

@jiafatom Can you share with me the tiny yolo2 implementation? There is a (hight) chance to find something wrong in my script but I do not find it, I would say that something is wrong in the preprocessing.

Now I am not sure with version of tiny yolov2 I use, I will take the opset 8 and I will try again.

SiR0N commented 4 years ago

I just checked the tiny yolov2 (opset 8) and did not work :( (neither in the JAVA/C code nor Python )

hoaquocphan commented 4 years ago

Hi @SiR0N

I completed Tiny yolov2 here, so could you share me the tiny yolov3 for reference.

Thanks