thepowerfuldeez / facemesh.pytorch

This is the PyTorch implementation of paper Real-time Facial Surface Geometry from Monocular Video on Mobile GPUs (https://arxiv.org/pdf/1907.06724.pdf)
Apache License 2.0
288 stars 64 forks source link

about pth results differ from orignal tfmodels #3

Open lexuszhi1990 opened 3 years ago

lexuszhi1990 commented 3 years ago

firstly, thanks for your excellent work, but I found torch result is slightly different from the original tflite model:

tensorflow and torch version:

(Pdb) tf.__version__
'2.2.1'
(Pdb) torch.__version__
'1.5.1'

test code:

import os
import numpy as np

import tensorflow as tf
import torch
import torch.nn as nn

from facemesh import FaceMesh

import cv2
sample_img = cv2.imread("test.jpg")
sample_img_192 = cv2.resize(sample_img, (192, 192))
input_data = np.expand_dims(sample_img_192, axis=0).astype(np.float32) / 127.5 - 1.0

interpreter = tf.lite.Interpreter(model_path="facemesh-lite.f16.tflite")
interpreter.allocate_tensors()

net = FaceMesh()
net.load_weights("facemesh.pth")

# Get input and output tensors.
input_details = interpreter.get_input_details()
output_details = interpreter.get_output_details()
input_shape = input_details[0]['shape']
# input_data = np.array(np.random.random_sample(input_shape), dtype=np.float32)

# tf inference
interpreter.set_tensor(input_details[0]['index'], input_data)
interpreter.invoke()
tf_coord_res = interpreter.get_tensor(output_details[0]['index'])

# torch inference
torch_output_data = net(torch.from_numpy(input_data.transpose(0, 3, 1, 2)))
torch_coord_res = torch_output_data[0].detach().numpy()

print(["torch", torch_coord_res[0]])
print(["tflite", tf_coord_res[0, 0, 0]])
print("diff %f" % (np.abs(torch_coord_res[0] - tf_coord_res[0, 0, 0]).mean()))

results:

#==>
['torch', array([ 94.1816  , 140.77983 , -14.322037, ..., 136.51678 ,  88.71278 ,
         6.525924], dtype=float32)]
['tflite', array([ 92.17496 , 139.39285 , -14.361812, ..., 134.26816 ,  87.34091 ,
         5.629876], dtype=float32)]
diff 1.167073

I think this may caused by the option conv2d("padding=same" ) different from tensorflow, have you fixed this problem or some advise?

thanks!

qhanson commented 2 years ago

How do you solve this problem? I find this problem too. It can be solved by changing this https://github.com/thepowerfuldeez/facemesh.pytorch/blob/348400fe32c60111a29e9e6891e230c0005ddd8a/facemesh.py#L114 to

x = nn.ConstantPad2d((0, 1, 0, 1), 0)(x)

You are right about the different by 'same' padding in pytorch and tensorflow.

However, the result is still slightly different from original mediapipe output. We need to further looking at the mediapipe source code.