tzutalin / dlib-android

:dragon: Port dlib to Android
MIT License
868 stars 268 forks source link

Flickery landmarks when converting raw bytes to grayscale #56

Open Isadorable opened 7 years ago

Isadorable commented 7 years ago

Hi, I've created a new function in the jni_face_detect that takes the camera preview as raw bytes, the face box coordinates computed with Android API and I just need to detect the landmarks of one single face with dlib (I want to skip the face detection phase). The raw bytes are converted from YUV to grayscale and then rotated, scaled and flipped according to my UI TextureView size. The landmarks are identified in a very little time but once displayed on the device screen they're very flickery even if the face is still and the face box coordinates remain the same, there's always a "micro" variation in the landmarks coordinates. This doesn't happen using the function jniBitmapDetect provided with the original .cpp file, that uses, instead, bitmaps. In that case, the landmarks are always very stable. I guess I'm messing something up during the conversions or maybe I'm not passing the right pointers...

So, taking inspiration from https://github.com/tzutalin/dlib-android/issues/39 my code looks like

   JNIEXPORT jobject JNICALL DLIB_FACE_JNI_METHOD(jniNewLandmarksDetection)(JNIEnv* env, jobject thiz, jbyteArray rawBytes, jfloatArray rect, jint width, jint height, jint widthB, jint heightB, jint rotation, jint scale) {
    jobject jDetRet = JNI_VisionDetRet::createJObject(env);
    g_pJNI_VisionDetRet->setLabel(env, jDetRet, "face"); 

    jbyte* b_data = (env)->GetByteArrayElements(rawBytes, 0);
    cv::Mat yuvMat = cv::Mat(height+height/2, width, CV_8UC1, (unsigned char*)b_data);
    cv::Mat grayMat = cv::Mat(height, width, CV_8UC1);
    cv::cvtColor(yuvMat, grayMat, CV_YUV2GRAY_NV21);

    cv::Mat scaledGrayscale;
    cv::resize(grayMat, scaledGrayscale, Size(heightB/scale,widthB/scale), 0, 0, INTER_LINEAR);
    cv::flip(scaledGrayscale,scaledGrayscale,0);

    jfloat* r = env->GetFloatArrayElements(rect,0);
    dlib::rectangle rec(r[3], r[0], r[1], r[2]);
    g_pJNI_VisionDetRet->setRect(env, jDetRet, r[3], r[0], r[1], r[2]);

    DetectorPtr detPtr = getDetectorPtr(env, thiz);        
    dlib::cv_image<unsigned char> img(scaledGrayscale);
    dlib::full_object_detection shape = detPtr->msp(img, rec);

    for (int k=0; k<shape.num_parts(); k++) {
        int x = shape.part(k).x();
        int y = shape.part(k).y();
        g_pJNI_VisionDetRet->addLandmark(env, jDetRet, x, y);
    }

    //Test, these two pics look good and they're identical
    dlib::save_bmp(img,"/mnt/sdcard/DCIM/input.bmp");
    cv::imwrite("/mnt/sdcard/DCIM/det.jpg", scaledGrayscale);
    return jDetRet;
}`

Can anyone see the possible cause of the flickering?

EzequielAdrianM commented 7 years ago

You say: "just need to detect the landmarks of one single face" but then you say "I want to skip the face detection phase". You have to understand that it is not possible to extract landmarks without performing face detection first. What I have done in my implementation was to skip some preview frames in order to make the detector run faster on some devices. Because landmark detection is fast and face detection is the bottleneck. But skipping frames can lead to "the face getting out of the lazy updating face-box" and then landmarks will flicker or become non existent. So if your preview is 12 fps or less, don't skip frames.

EzequielAdrianM commented 7 years ago

Another concern you must take into account is the image resolution. Keep in mind that the minimum face size that Dlib will detect is 90x90 pixels. So don't downscale your preview too much, or Dlib will start loosing all faces, even the big ones, thus causing a lot of flickering!

Isadorable commented 7 years ago

I do perform face detection. I just don't do it with dlib and i'm not interested in doing it with dlib because I can do it way faster using other solutions. I just pass the face box coordinates as you can see in my code and believe me, it works perfectly and the landmarks are detected very quickly and precisely without skipping any frame, they're just slightly flickery, that's all. I have the same problem even with 1920x1080 frames with my face right in front of the camera so i'm afraid this is not the case. My main concern is that maybe this flickering might be related to these series of transformations that perhaps are creating some kind of noise that can't be visible to the naked eye and I was wondering if anyone has ever experienced this problem.

EzequielAdrianM commented 7 years ago

Okay, detecting all the faces in all the frames you give to the detector is difficult. And some little flickering is expected. But if you really noticed that in the original implemetation (with bitmaps) there's no flickering at all maybe there's something making noise, just as you are suspecting. Assuming noise on the image, you should be able to see it when you save the Mat frames to internal storage. There's no invisible noise.

Isadorable commented 7 years ago

I would argue that there's no invisible noise to the naked eye as it is a common way to fool machine learning algorithms, anyway, thank you for your reply I'll try to focus more on that aspect, I was just hoping for a more straightforward solution like errors in my part of code :D

EzequielAdrianM commented 7 years ago

Your code is fine. It receives YUV420SP bytes, converts to Mat, then grayscale, then you perform rotate/scale/flip. But you are not using the original Dlib code to detect the faces, so I can't help, sorry.