Closed utility-aagrawal closed 9 months ago
Hello!
Have you tried first timing several different parts of your code to identify where the bottleneck might be? From what I understand by your previous issues, you are using several things, such as detectors, embedding models for reid, motion estimators, etc. If you manage to do that maybe we can provide better recommendations.
Also, if it is no problem for you, could you provide a (maybe brief) version of your code, so that we know which things are you actually using from norfair?
Thanks for the quick turnaround, @aguscas ! Let me time each of my modules and I'll get back to you shortly. and yes, I can share the code with you.
Hi @aguscas , I timed different parts of my code and realized that detections/embeddings are taking the most amount of time. I am using Retinaface for detections and Facenet512 for embeddings. Currently, embeddings consume ~80% of the total time and detections consume ~15% of the total time.
Here's my code:
from deepface import DeepFace
import numpy as np
from norfair import Detection, Tracker, Video, get_cutout, draw_boxes, draw_points
from norfair.filter import OptimizedKalmanFilterFactory
from norfair.camera_motion import MotionEstimator
from scipy.spatial.distance import cosine
def minimum_embedding_distance(matched_not_init_trackers, unmatched_trackers): list_of_snd_embedding = [] list_of_fst_embedding = []
# get the embeddings of the unmatched_trackers
if unmatched_trackers.last_detection.embedding is not None:
list_of_snd_embedding.append(unmatched_trackers.last_detection.embedding)
for detection in unmatched_trackers.past_detections:
if detection.embedding is not None:
list_of_snd_embedding.append(detection.embedding)
if len(list_of_snd_embedding)==0:
return 1
# get the embeddings of the matched_not_init_trackers
if matched_not_init_trackers.last_detection.embedding is not None:
list_of_fst_embedding.append(matched_not_init_trackers.last_detection.embedding)
for detection in matched_not_init_trackers.past_detections:
if detection.embedding is not None:
list_of_fst_embedding.append(detection.embedding)
if len(list_of_fst_embedding)==0:
return 1
# compare all the embeddings
distances = []
for embedding1 in list_of_fst_embedding:
for embedding2 in list_of_snd_embedding:
distances.append(1 - cosine(embedding1, embedding2))
# take the minimum (you could take the average with np.mean instead)
return np.min(np.array(distances))
def detect_faces(frame):
return ""
def retinaface_detections_to_norfair_detections(retinaface_detections):
return ""
def main( embed_model: str = "Facenet512", skip_period: int = 1, border_size: int = 10, track_points = "bbox", ):
video = Video(input_path="", output_path="")
DISTANCE_THRESHOLD_BBOX: float = 0.5
DISTANCE_THRESHOLD_CENTROID: int = 30
MAX_DISTANCE: int = 10000
distance_function = "iou" if track_points == "bbox" else "euclidean"
distance_threshold = (
DISTANCE_THRESHOLD_BBOX
if track_points == "bbox"
else DISTANCE_THRESHOLD_CENTROID
)
tracker = Tracker(
initialization_delay=3,
distance_function=distance_function,
hit_counter_max=10,
filter_factory=OptimizedKalmanFilterFactory(),
distance_threshold=distance_threshold,
past_detections_length=5,
reid_distance_function=minimum_embedding_distance,
reid_distance_threshold=0.5,
reid_hit_counter_max=np.inf,
)
motion_estimator = MotionEstimator()
for i, cv2_frame in enumerate(video):
if i % skip_period == 0:
retinaface_detections = detect_faces(cv2_frame)
detections = retinaface_detections_to_norfair_detections(
retinaface_detections, track_points=track_points
)
frame = cv2_frame.copy()
# here I am generating the mask from the detections (you can also use the tracked_object if you want)
mask = np.ones(frame.shape[:2], frame.dtype)
for d in detections:
bbox = d.points.astype(int)
mask[bbox[0, 1] : bbox[1, 1], bbox[0, 0] : bbox[1, 0]] = 0
# here I am passing that mask to the motion estimator
coord_transformation = motion_estimator.update(frame) #, mask)
for detection in detections:
cut = get_cutout(detection.points, frame)
if cut.shape[0] > 0 and cut.shape[1] > 0:
detection.embedding = DeepFace.represent(img_path = cut, model_name = embed_model, enforce_detection = False, detector_backend = "retinaface")[0]["embedding"]
else:
detection.embedding = None
tracked_objects = tracker.update(detections=detections, period=skip_period, coord_transformations=coord_transformation)
else:
tracked_objects = tracker.update()
if track_points == "bbox":
draw_boxes(cv2_frame, tracked_objects, draw_ids = True)
else:
draw_points(cv2_frame, tracked_objects)
frame_with_border = np.ones(
shape=(
cv2_frame.shape[0] + 2 * border_size,
cv2_frame.shape[1] + 2 * border_size,
cv2_frame.shape[2],
),
dtype=cv2_frame.dtype,
)
frame_with_border *= 254
frame_with_border[
border_size:-border_size, border_size:-border_size
] = cv2_frame
video.write(frame_with_border)
if name == "main": main()
Since this doesn't look like a norfair issue, I am not sure if you could help. If you still have any suggestions, let me know. Thanks!
Sorry for my late response. Given that 95% of your time is consumed by your models (80% embeddings + 15% detection), and not related to norfair, there is not much I can say other than changing your models (especially the one for embeddings), but that can come with a hit in the overall performance (accuracy, etc). Are you also making use of a GPU or are you running your models on a CPU?
No worries and thanks for your response, @aguscas ! Yes, I am using an NVIDIA GPU with 16 GB VRAM. I am trying to find other efficient embedding models now :)
Hi,
I am using this library to track faces. So far, I have been processing every frame of my videos and although results look great, the process is extremely slow. I have tried skipping frames but that degrades the tracking performance a lot. I have tried suggestions provided in the issue #301 but results don't look good so far. I was wondering if you have any other recommendations for me. Thanks!