zweigraf / face-landmarking-ios

👦 Basic face landmarking on iPhone with Dlib via Swift & ObjC++
481 stars 126 forks source link

Rotate camera orientation #5

Open hoangdado opened 8 years ago

hoangdado commented 8 years ago

I had tried to set the camera orientation to Landscape or Portrait but the below code (in DlibWrapper.mm) still return width = 640 and height = 480 (with preset is AVCaptureSessionPreset640x480). size_t width = CVPixelBufferGetWidth(imageBuffer); size_t height = CVPixelBufferGetHeight(imageBuffer); Then I couldn't do the landmark detection in Portrait view. Could you fix it?

stanchiang commented 8 years ago

^bump

same issue, more or less. i updated my plist to show portrait w/ home button at bottom (normal orientation) and attached the result. the "face landmark mask" is drawn upright but, the camera feed is rotated landscape.

@zweigraf maybe you could find some time to document how the orientation gets determined/configured? I've rotated AVCaptureVideoPreviewLayer before but rotating AVSampleBufferDisplayLayer doesn't seem possible since i couldn't find examples off google.

img_3615

hoangdado commented 8 years ago

@stanchiang I found that if you change the camera orientation to portrait or landscape the values still are width = 640 and height = 480. I think it maybe the camera hardware will alsway output the buffer with this size. What you can do is rotating the image when copying pixel value from CVPixelBuffer to dlib::array2ddlib::bgr_pixel. In addition you need to rotate the input face rect. I fixed the issue by doing that, but I got the performance problem. The doWorkOnSampleBuffer method consume too much CPU.

stanchiang commented 8 years ago

can you add some sample code for your implementation? i was trying to do something like that but it wasn't working right.

hoangdado commented 8 years ago

For copying pixel values:

    img.set_size(width, height);
    img.reset();
    long position = 0;

    while (img.move_next()) {
        lib::bgr_pixel& pixel = img.element();

        size_t row = position / height;
        size_t col = position % height;

        long bufferLocation = (col * width + row) * 4;

        char b = baseBuffer[bufferLocation];
        char g = baseBuffer[bufferLocation + 1];
        char r = baseBuffer[bufferLocation + 2];

        dlib:: bgr_pixel newpixel(b, g, r);
        pixel = newpixel;

        position++;
    }

For rotate face rect, I think you should code yourself for convenience. You only need to change the oneFaceRect.

realmosn commented 8 years ago

@hoangdado this code doesn't work for me, have you tested this ?

teresakozera commented 8 years ago

Well, I also faced this problem. I didn't solve it but moved a little bit. Firstly, in the Target -> General: screen shot 2016-08-08 at 16 27 43

Next thing- 'SessionHandler.swift':

`func captureOutput(captureOutput: AVCaptureOutput!, didOutputSampleBuffer sampleBuffer: CMSampleBuffer!, fromConnection connection: AVCaptureConnection!) {

    connection.videoOrientation = AVCaptureVideoOrientation.Portrait

`

And the outcome is: img_0130

As you can see the landmarks are very distorted.

I hope someone will find it helpful and share the solution. :)

hoangdado commented 8 years ago

@teresakozera I solved your problem. You only need to update convertScaleCGRect method as bellow:

    long right = (1.0 - rect.origin.y ) * size.width;
    long left = right - rect.size.height * size.width;
    long top = rect.origin.x * size.height;
    long bottom = top + rect.size.width * size.height;

@stanchiang You can follow this solution. It is much easier than that I recommend you before. View my fork project for source code https://github.com/hoangdado/face-landmarking-ios

Notice: With my fix, I don't know why the mouth landmarks is not exactly correct while the others is perfect!

realmosn commented 8 years ago

@hoangdado That fixed the issue, Thank you!
My issue is the landmarks are not so accurate for the mouth and around the face. were you able to fix it ?

stanchiang commented 8 years ago

@hoangdado thanks that helped a lot!

I also had to do an affine transformation on the layer so that the output isn't mirrored the opposite way with:

    layer.setAffineTransform(CGAffineTransformMakeRotation(CGFloat(M_PI)))
    layer.setAffineTransform(CGAffineTransformScale(layer.affineTransform(), 1, -1))
teresakozera commented 8 years ago

@hoangdado, thank you! it works perfect, no distortion- even in the mouth region. :)

previously I tried to manipulate convertScaleCGRect method, but I was scaling and changing the parameters instead of thinking of any kind of subtraction...

stanchiang commented 8 years ago

@teresakozera for me the distorting is more of a stability issue when trying to maintain the tracking as the tolerance for difference angled faces seems to have gone down a bit for me when I try moving my face and the mask gets jittery.

Am I facing a different issue than you guys?

teresakozera commented 8 years ago

@stanchiang I also observed this problem, but it also existed previously, with the ladndscape orientation. In my case it's not that big of an issue as I need it mostly in the direct position of the head towards camera. But I will also try to fix it- if I succeed I will certainly let you know. :)

realmosn commented 8 years ago

@teresakozera something off topic, could you please tell me how you got the landmarking lines working ? all I see in the app is the dots

thanks

stanchiang commented 8 years ago

@ArtSebus probably just used the function dlib::draw_line(img, <#const point &p1#>, <#const point &p2#>, <#const pixel_type &val#>);

realmosn commented 8 years ago

@stanchiang could you please suggest what should I pass in for the parameters <#const point &p1#>, <#const point &p2#>, <#const pixel_type &val#>) Sorry I am not that good with C programming

stanchiang commented 8 years ago

@ArtSebus haven't touched c in a few years myself haha. bit it looks like you's need to pass in a couple dlib::point that you want to connect and then specify what type of line you want to draw for the last one. I'd try inputing 3 for the value. No reason for that number, its just the same number that was used when drawing the dots in the existing code.

stanchiang commented 8 years ago

@teresakozera trying something a little different right now. i'm storing shape.parts[60-67] which make up the mouth in a separate array and trying to pass it into UIKit/SceneKit to draw it separately.

[m addObject: [NSValue valueWithCGPoint:CGPointMake( [DlibWrapper pixelToPoints:p.x()], [DlibWrapper pixelToPoints:p.y()]) ]];

converting from pixels to points using this function https://gist.github.com/jordiboehmelopez/3168819

The problem it still seems stuck in the old bounds. sort of like the old screenshot i posted. I wasn't expecting this problem because we call convertCGRectValueArray before we loop through the shape.part array

teresakozera commented 8 years ago

@ArtSebus line method, have a look at this: https://github.com/chili-epfl/attention-tracker/blob/master/README.md :)

@stanchiang - hmmm... a little bit odd. So with code from here and all the changes it works but when you try the above (a conversion from pixels to points) it displays landmarks in the other orientation? Does it happen after it is passed to UIKit or you check it before?

stanchiang commented 8 years ago

@teresakozera - solved the transformation issue. is was my own fault. but now i noticed there is an issue where my cgpoint coordinates have a weird offset for some reason.

for example in my gamescene.swift file i had to add center = CGPointMake(center.x+50,center.y-100)

here's my code to show you what i mean https://github.com/stanchiang/face-landmarking-ios

teresakozera commented 8 years ago

@stanchiang- I will have a look at it on Monday, as today I'm heading for a little bit longer weekend. Anyway I hope you manage to solve this problem earlier. :) Have a nice weekend!

morizotter commented 8 years ago

@hoangdado @stanchiang Thanks! I used your solution and almost all problems solved. Later, I found the better - just I think - way.

I made pull request: https://github.com/zweigraf/face-landmarking-ios/pull/9 . In this pr, I convert faceObject in SessionHandler for fitting the given orientation.

Even If connection's orientation is portrait, it works well.

How do you think??

realmosn commented 8 years ago

@teresakozera Could you suggest how to integrate attentionTracker into the project, I've tried for some time but still stuck and haven't got anywhere. almost close to pull my hairs out

Miths19 commented 7 years ago

i want to crop landmarked portion of the face...... i want only the face can any one help me for this

trungnguyen1791 commented 7 years ago

@stanchiang You could easily change "VideoMirrored" mode with this one instead of doing some manual transforms if (connection.isVideoMirroringSupported) { connection.isVideoMirrored = true; }

wangwenzhen commented 6 years ago

@stanchiang I want to be able to support detection in both horizontal and vertical screens,can you provide sample demo ?

liamwalsh commented 6 years ago

bump I'm still having this issue on the latest master - setting my AVCaptureConnection videoOrientation to portrait causes all of the feature points to be wrong.

Hardy143 commented 6 years ago

@liamwalsh were you able to find a solution? I'm having the same problem as you.

Hardy143 commented 6 years ago

@liamwalsh I found I was putting connection.videoOrientation = AVCaptureVideoOrientation.portrait in the wrong captureOutput function. It now works for me:

screen shot 2018-07-05 at 17 45 26
jpatel956 commented 5 years ago

@stanchiang

First of thanks for your link. https://github.com/stanchiang/face-landmarking-ios

I have succesfully run your code but not have one issue that you face earlier may be offset where my cgpoint coordinates have a weird offset for some reason.

Here I am getting error:

validateTextureDimensions, line 759: error 'MTLTextureDescriptor has width (114046) greater than the maximum allowed size of 8192.' validateTextureDimensions:759: failed assertion `MTLTextureDescriptor has width (114046) greater than the maximum allowed size of 8192.'

can you please help me to come out.

Thanks

ScientistMe commented 4 years ago

Hi, i'm dealing with this issue, and i'm not able to get it working in portrait mode. I've read all the threads here. My guess is that in portrait mode i'm having a wrong proportions of the layer where the wrapper draws the points on it because the points looks distorted. IMG-4956 IMG-4957

Can you please help me?

wonmor commented 1 month ago

I managed to fix the issue of this solution not working on the latest version of the code base; just 4 years after. What a long time it took me solve this. Jokes aside, it actually took a decent time to figure this out:

If you go through the instructions provided in https://github.com/zweigraf/face-landmarking-ios/issues/5 You'll be able to quickly figure out that there's no convertScaleCGRect function in the recent version of this code base.

That's because the author has pushed a "simplified version" of DlibWrapper class, so I had to go through the previous commit history and I found it (the one in May 2016).

First and foremost, replace the entirety of your DlibWrapper.mm file to the following:

//
//  DlibWrapper.m
//  DisplayLiveSamples
//
//  Created by Luis Reisewitz on 16.05.16.
//  Copyright © 2016 ZweiGraf. All rights reserved.
//

#import "DlibWrapper.h"
#import <UIKit/UIKit.h>

#include <dlib/image_processing.h>
#include <dlib/image_io.h>

@interface DlibWrapper ()

@property (assign) BOOL prepared;

+ (dlib::rectangle)convertScaleCGRect:(CGRect)rect toDlibRectacleWithImageSize:(CGSize)size;
+ (std::vector<dlib::rectangle>)convertCGRectValueArray:(NSArray<NSValue *> *)rects toVectorWithImageSize:(CGSize)size;

@end
@implementation DlibWrapper {
    dlib::shape_predictor sp;
}

-(instancetype)init {
    self = [super init];
    if (self) {
        _prepared = NO;
    }
    return self;
}

- (void)prepare {
    NSString *modelFileName = [[NSBundle mainBundle] pathForResource:@"shape_predictor_68_face_landmarks" ofType:@"dat"];
    std::string modelFileNameCString = [modelFileName UTF8String];

    dlib::deserialize(modelFileNameCString) >> sp;

    // FIXME: test this stuff for memory leaks (cpp object destruction)
    self.prepared = YES;
}

-(void)doWorkOnSampleBuffer:(CMSampleBufferRef)sampleBuffer inRects:(NSArray<NSValue *> *)rects {

    if (!self.prepared) {
        [self prepare];
    }

    dlib::array2d<dlib::bgr_pixel> img;

    // MARK: magic
    CVImageBufferRef imageBuffer = CMSampleBufferGetImageBuffer(sampleBuffer);
    CVPixelBufferLockBaseAddress(imageBuffer, kCVPixelBufferLock_ReadOnly);

    size_t width = CVPixelBufferGetWidth(imageBuffer);
    size_t height = CVPixelBufferGetHeight(imageBuffer);
    char *baseBuffer = (char *)CVPixelBufferGetBaseAddress(imageBuffer);

    // set_size expects rows, cols format
    img.set_size(height, width);

    // copy samplebuffer image data into dlib image format
    img.reset();
    long position = 0;
    while (img.move_next()) {
        dlib::bgr_pixel& pixel = img.element();

        // assuming bgra format here
        long bufferLocation = position * 4; //(row * width + column) * 4;
        char b = baseBuffer[bufferLocation];
        char g = baseBuffer[bufferLocation + 1];
        char r = baseBuffer[bufferLocation + 2];
        //        we do not need this
        //        char a = baseBuffer[bufferLocation + 3];

        dlib::bgr_pixel newpixel(b, g, r);
        pixel = newpixel;

        position++;
    }

    // unlock buffer again until we need it again
    CVPixelBufferUnlockBaseAddress(imageBuffer, kCVPixelBufferLock_ReadOnly);

    CGSize imageSize = CGSizeMake(width, height);

    // convert the face bounds list to dlib format
    std::vector<dlib::rectangle> convertedRectangles = [DlibWrapper convertCGRectValueArray:rects toVectorWithImageSize:imageSize];

    // for every detected face
    for (unsigned long j = 0; j < convertedRectangles.size(); ++j)
    {
        dlib::rectangle oneFaceRect = convertedRectangles[j];

        // detect all landmarks
        dlib::full_object_detection shape = sp(img, oneFaceRect);

        // and draw them into the image (samplebuffer)
        for (unsigned long k = 0; k < shape.num_parts(); k++) {
            dlib::point p = shape.part(k);
            draw_solid_circle(img, p, 3, dlib::rgb_pixel(0, 255, 255));
        }
    }

    // lets put everything back where it belongs
    CVPixelBufferLockBaseAddress(imageBuffer, 0);

    // copy dlib image data back into samplebuffer
    img.reset();
    position = 0;
    while (img.move_next()) {
        dlib::bgr_pixel& pixel = img.element();

        // assuming bgra format here
        long bufferLocation = position * 4; //(row * width + column) * 4;
        baseBuffer[bufferLocation] = pixel.blue;
        baseBuffer[bufferLocation + 1] = pixel.green;
        baseBuffer[bufferLocation + 2] = pixel.red;
        //        we do not need this
        //        char a = baseBuffer[bufferLocation + 3];

        position++;
    }
    CVPixelBufferUnlockBaseAddress(imageBuffer, 0);
}

+ (dlib::rectangle)convertScaleCGRect:(CGRect)rect toDlibRectacleWithImageSize:(CGSize)size {
    long right = (1.0 - rect.origin.y ) * size.width;
    long left = right - rect.size.height * size.width;
    long top = rect.origin.x * size.height;
    long bottom = top + rect.size.width * size.height;

    dlib::rectangle dlibRect(left, top, right, bottom);
    return dlibRect;
}

+ (std::vector<dlib::rectangle>)convertCGRectValueArray:(NSArray<NSValue *> *)rects toVectorWithImageSize:(CGSize)size {
    std::vector<dlib::rectangle> myConvertedRects;
    for (NSValue *rectValue in rects) {
        CGRect singleRect = [rectValue CGRectValue];
        dlib::rectangle dlibRect = [DlibWrapper convertScaleCGRect:singleRect toDlibRectacleWithImageSize:size];
        myConvertedRects.push_back(dlibRect);
    }
    return myConvertedRects;
}

@end

I applied the changes made in the comments in issue #5 so that now it supports portrait mode.

Next change you need to make is... Go to SessionHandler and locate the following:

 func captureOutput(_ output: AVCaptureOutput, didDrop sampleBuffer: CMSampleBuffer, from connection: AVCaptureConnection) {
        print("DidDropSampleBuffer")
    }

BE AWARE! There are TWO captureOutput(s) — you must choose the one which has NO code inside of the function block except for a simple print line. Then you wanna add the following inside the function: connection.videoOrientation = AVCaptureVideoOrientation.portrait

So the final version of SessionHandler's captureOutput function will look like the following:

func captureOutput(_ output: AVCaptureOutput, didDrop sampleBuffer: CMSampleBuffer, from connection: AVCaptureConnection) {
        print("DidDropSampleBuffer")
        connection.videoOrientation = AVCaptureVideoOrientation.portrait
    }

BOOM. All issues have been resolved. OH BY THE WAY, add session.sessionPreset = AVCaptureSession.Preset.vga640x480 RIGHT BEFORE the session.startRunning() line in ViewController.swift to enable legacy style video streaming (640x480 instead of 1024 something dimensions) so that there's LESS noise and instability in the landmark data. Lower resolution helps because it lowers the demand for the machine to handle.

Hope that helps, I know maybe I'm a little too late now that the Vision framework/ARKit framework is out, but in the case you're writing C++ code in tandem with Swift and want to import these stuff from C++ side too using Objective-C++ — this is a tutorial for you!

John Seong

wonmor commented 1 month ago

Never mind — you're supposed to add connection.videoOrientation = AVCaptureVideoOrientation.portrait to the OTHER captureOutput NOT the one I indicated above. Sorry.

wonmor commented 1 month ago

ANOTHER UPDATE - just replace the whole captureOutput to the following:

    // MARK: AVCaptureVideoDataOutputSampleBufferDelegate
    func captureOutput(_ output: AVCaptureOutput, didOutput sampleBuffer: CMSampleBuffer, from connection: AVCaptureConnection) {
        connection.videoOrientation = AVCaptureVideoOrientation.portrait

        if !currentMetadata.isEmpty {
            let boundsArray = currentMetadata
                .compactMap { $0 as? AVMetadataFaceObject }
                .map { NSValue(cgRect: $0.bounds) }

            wrapper?.doWork(on: sampleBuffer, inRects: boundsArray)
        }

        layer.enqueue(sampleBuffer)
    }

You also have to tinker with changing the .map part to .map { NSValue(cgRect: $0.bounds) }.