zhuhao-nju / facescape

FaceScape (PAMI2023 & CVPR2020)
833 stars 94 forks source link

cpu条件下的推理时间 #75

Closed birdflies closed 2 years ago

birdflies commented 2 years ago

大佬,您好, 请问facescape的实时性能怎么样呀。我输入一张图片,测试的时候,cpu推理时间20s左右,是不是没办法想3dffa_v2,那么快呀~

zhuhao-nju commented 2 years ago

Hi,

Yes, the demo_bilinear_fit is based on optimization method, which is generally slower than neural network based regressor. Using a more efficient optimizer would certainly speed demo_bilinear_fit up a lot, but I haven't implemented it yet.

Bornblack commented 2 years ago

@zhuhao-nju Hi, Can you share the implementation of pixel-level consistency part for the Base Model fitting? And I am confusing about the get_texture() funtion in facescape_fitter.py, any reference for that?

zhuhao-nju commented 2 years ago

@zhuhao-nju Hi, Can you share the implementation of pixel-level consistency part for the Base Model fitting? And I am confusing about the get_texture() funtion in facescape_fitter.py, any reference for that?

Hi @Bornblack ,

We have removed the pixel-level consistency and albedo part from base model fitting, as we find that it does not have much positive impact, but reduces the accuracy in some cases. We think the main reason is that the texture of our bilinear model only covers the skin color of Asian people and is in relatively low resolution.

The function get_texture() transfers the pixels of the input image to the UV space, generating a UV texture map for the fitted mesh. This function was added later and doesn't help to shape fitting.

Bornblack commented 2 years ago

@zhuhao-nju Thanks for your reply !

Without the pixel-level consistency and aldedo part, the model fitting reply exclusively on the result of landmark detection and the capability of shaping of the bilinear model. I think that is not enough for the reconstruction for the whole face.

image

Will it be improved in the future for the lack of diversity of the texture model? Maybe we can use the texture model of Basel Face Model as an replacement. Reference: https://github.com/TimoBolkart/BFM_to_FLAME

zhuhao-nju commented 2 years ago

@Bornblack I agree~ Indeed the fitting by only landmarks leads to inaccurate middle-scale geometry, and adding texture constraints will improve it to some intent. I'll try your suggestions but I'm afraid I won't have time for it in recent weeks.

Recently I'm focusing on another way that is to use neural networks for a non-parametric face reconstruction. I think it should be a better solution to fully leverage the photometric features and also break the limitation of 3DMM to recover accurate middle-scale geometry.

Glad to share and discuss different ideas!

Neo1024 commented 2 years ago

Hi,

Thanks a lot for this wonderful work and the handy tools you shared!

I am new to the the area of 3D face. I got some confusions during my exploration. It'd be nice if you can offer some suggestions:

How did you make the non-pca expression blendshapes and combine them into the model? Usually, the expression coeffcients are pca-based, which doesn't have semantic meanings. And also in your collected data, I understand that you only collected 20 expressions including the neutral expression.

Looking forward to your suggestions : )

zhuhao-nju commented 2 years ago

Hi @Neo1024 ,

We collected the 3D faces of 847 identities, and each with 20 expressions. In our bilinear model, the PCA is only applied to the dimension of identities, but not expressions. So the model contains both non-pca expression and pca identity.

The reason to do so is that the identities are in large amounts with no semantic meanings, while the 20 expressions are manually defined that can be divided into 51 blendshapes.

Neo1024 commented 2 years ago

@zhuhao-nju Thanks for the reply : )

Based on some of the 20 expressions shown in your paper and the visualization of all the 51 blendshapes, I guess the 51 blendshapes are originated from the 20 expressions (average from all subjects). Did you generate the 50 blendshapes, excluding the neutral one, by first selecting a particular expression from the 19 expressions and masking part of the face? For example, the first two blendshapes correspond to closed eyes. If you have collected an expression with eyes closed, which is defined as one of the 19 expressions performed by subjects. Then you generate the first two blendshapes by masking the eye area and combine the masked area with the neutral blendshape to get the final blendshapes of closed eyes.

I don't know if I understand you correctly now. I originally thought you somehow generated the 51 blendshapes according to the 52 Apple ARkit blendshapes.

zhuhao-nju commented 2 years ago

@Neo1024 Yes, what we did is basically the same as you siad in the first paragraph. I am going to add the code of rigging to this repo in next week, which rigs 20 expressions to 50 expressions. I will let you konw when it is done.

Neo1024 commented 2 years ago

@zhuhao-nju Thank you so much for the reply and your contribution on continuously maintaining / updating this great repo : )

zhuhao-nju commented 2 years ago

Hi @Neo1024 , the code and descriptions about rigging have been added here: https://github.com/zhuhao-nju/facescape/blob/master/toolkit/demo_rig.ipynb

birdflies commented 2 years ago

Hi @Neo1024 , the code and descriptions about rigging have been added here: https://github.com/zhuhao-nju/facescape/blob/master/toolkit/demo_rig.ipynb

good jobs~

Neo1024 commented 2 years ago

Hi @Neo1024 , the code and descriptions about rigging have been added here: https://github.com/zhuhao-nju/facescape/blob/master/toolkit/demo_rig.ipynb

Great! Thanks a lot : )