Open JimBratsos opened 4 years ago
What model did you use? And can you share your script?
I don't know what is pred.
yolo.inference()
has no return and yolo.predict()
return pred_bboxes == Dim(-1, (x, y, w, h, class_id, probability))
If you want (1,19,19,x) shape, use yolo.model.predict()
And ref: https://github.com/hhk7734/tensorflow-yolov4/issues/23#issuecomment-687859586
To speed up, I'll test it out ASAP.
I use the yolov4-tiny with relu activation, that is converted to tflite. From what I remembered from netron it has 2 outputs. The script I am using uses 3 outputs, thus the 2nd issue I am facing probably. Here is the script:
import os
# comment out below line to enable tensorflow logging outputs
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3'
import time
import tensorflow as tf
physical_devices = tf.config.experimental.list_physical_devices('GPU')
if len(physical_devices) > 0:
tf.config.experimental.set_memory_growth(physical_devices[0], True)
from absl import app, flags, logging
from absl.flags import FLAGS
import core.utils as utils
from core.yolov4 import decode,filter_boxes
from tensorflow.python.saved_model import tag_constants
from core.config import cfg
from PIL import Image
import cv2
import numpy as np
import matplotlib.pyplot as plt
from tensorflow.compat.v1 import ConfigProto
from tensorflow.compat.v1 import InteractiveSession
# deep sort imports
from deep_sort import preprocessing, nn_matching
from deep_sort.detection import Detection
from deep_sort.tracker import Tracker
from tools import generate_detections as gdet
flags.DEFINE_string('framework', 'tf', '(tf, tflite, trt')
flags.DEFINE_string('weights', './checkpoints/yolov4-416',
'path to weights file')
flags.DEFINE_integer('size', 416, 'resize images to')
flags.DEFINE_boolean('tiny', False, 'yolo or yolo-tiny')
flags.DEFINE_string('model', 'yolov4', 'yolov3 or yolov4')
flags.DEFINE_string('video', './data/video/test.mp4', 'path to input video or set to 0 for webcam')
flags.DEFINE_string('output', None, 'path to output video')
flags.DEFINE_string('output_format', 'XVID', 'codec used in VideoWriter when saving video to file')
flags.DEFINE_float('iou', 0.45, 'iou threshold')
flags.DEFINE_float('score', 0.50, 'score threshold')
flags.DEFINE_boolean('dont_show', False, 'dont show video output')
flags.DEFINE_boolean('info', False, 'show detailed info of tracked objects')
flags.DEFINE_boolean('count', False, 'count objects being tracked on screen')
def main(_argv):
# Definition of the parameters
max_cosine_distance = 0.4
nn_budget = None
nms_max_overlap = 1.0
# initialize deep sort
model_filename = 'model_data/mars-small128.pb'
encoder = gdet.create_box_encoder(model_filename, batch_size=1)
# calculate cosine distance metric
metric = nn_matching.NearestNeighborDistanceMetric("cosine", max_cosine_distance, nn_budget)
# initialize tracker
tracker = Tracker(metric)
# load configuration for object detector
config = ConfigProto()
config.gpu_options.allow_growth = True
session = InteractiveSession(config=config)
STRIDES, ANCHORS, NUM_CLASS, XYSCALE = utils.load_config(FLAGS)
input_size = FLAGS.size
video_path = FLAGS.video
# load tflite model if flag is set
if FLAGS.framework == 'tflite':
interpreter = tf.lite.Interpreter(model_path=FLAGS.weights)
interpreter.allocate_tensors()
input_details = interpreter.get_input_details()
output_details = interpreter.get_output_details()
print(input_details)
print(output_details)
# otherwise load standard tensorflow saved model
else:
saved_model_loaded = tf.saved_model.load(FLAGS.weights, tags=[tag_constants.SERVING])
infer = saved_model_loaded.signatures['serving_default']
# begin video capture
try:
vid = cv2.VideoCapture(int(video_path))
except:
vid = cv2.VideoCapture(video_path)
out = None
# get video ready to save locally if flag is set
if FLAGS.output:
# by default VideoCapture returns float instead of int
width = int(vid.get(cv2.CAP_PROP_FRAME_WIDTH))
height = int(vid.get(cv2.CAP_PROP_FRAME_HEIGHT))
fps = int(vid.get(cv2.CAP_PROP_FPS))
codec = cv2.VideoWriter_fourcc(*FLAGS.output_format)
out = cv2.VideoWriter(FLAGS.output, codec, fps, (width, height))
# while video is running
while True:
return_value, frame = vid.read()
if return_value:
frame = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
image = Image.fromarray(frame)
else:
print('Video has ended or failed, try a different video format!')
break
frame_size = frame.shape[:2]
image_data = cv2.resize(frame, (input_size, input_size))
image_data = image_data / 255.
image_data = image_data[np.newaxis, ...].astype(np.float32)
start_time = time.time()
# run detections on tflite if flag is set
if FLAGS.framework == 'tflite':
interpreter = tf.lite.Interpreter(model_path=FLAGS.weights)
interpreter.allocate_tensors()
input_details = interpreter.get_input_details()
output_details = interpreter.get_output_details()
print(input_details)
print(output_details)
interpreter.set_tensor(input_details[0]['index'], image_data)
interpreter.invoke()
pred = [interpreter.get_tensor(output_details[i]['index']) for i in range(len(output_details))]
# add post process code here
bbox_tensors = []
prob_tensors = []
for i, fm in enumerate(pred):
if i == 0:
output_tensors = decode(pred[2], input_size // 8, NUM_CLASS, STRIDES, ANCHORS, i, XYSCALE, 'tflite')
elif i == 1:
output_tensors = decode(pred[0], input_size // 16, NUM_CLASS, STRIDES, ANCHORS, i, XYSCALE, 'tflite')
else:
output_tensors = decode(pred[1], input_size // 32, NUM_CLASS, STRIDES, ANCHORS, i, XYSCALE, 'tflite')
bbox_tensors.append(output_tensors[0])
prob_tensors.append(output_tensors[1])
pred_bbox = tf.concat(bbox_tensors, axis=1)
pred_prob = tf.concat(prob_tensors, axis=1)
pred = (pred_bbox, pred_prob)
if FLAGS.model == 'yolov3' and FLAGS.tiny == True:
boxes, pred_conf = filter_boxes(pred[1], pred[0], score_threshold=0.25, input_shape=tf.constant([input_size, input_size]))
else:
boxes, pred_conf = filter_boxes(pred[0], pred[1], score_threshold=0.25, input_shape=tf.constant([input_size, input_size]))
else:
batch_data = tf.constant(image_data)
pred_bbox = infer(batch_data)
for key, value in pred_bbox.items():
boxes = value[:, :, 0:4]
pred_conf = value[:, :, 4:]
boxes, scores, classes, valid_detections = tf.image.combined_non_max_suppression(
boxes=tf.reshape(boxes, (tf.shape(boxes)[0], -1, 1, 4)),
scores=tf.reshape(
pred_conf, (tf.shape(pred_conf)[0], -1, tf.shape(pred_conf)[-1])),
max_output_size_per_class=50,
max_total_size=50,
iou_threshold=FLAGS.iou,
score_threshold=FLAGS.score
)
# convert data to numpy arrays and slice out unused elements
num_objects = valid_detections.numpy()[0]
bboxes = boxes.numpy()[0]
bboxes = bboxes[0:int(num_objects)]
scores = scores.numpy()[0]
scores = scores[0:int(num_objects)]
classes = classes.numpy()[0]
classes = classes[0:int(num_objects)]
# format bounding boxes from normalized ymin, xmin, ymax, xmax ---> xmin, ymin, width, height
original_h, original_w, _ = frame.shape
bboxes = utils.format_boxes(bboxes, original_h, original_w)
# store all predictions in one parameter for simplicity when calling functions
pred_bbox = [bboxes, scores, classes, num_objects]
# read in all class names from config
class_names = utils.read_class_names(cfg.YOLO.CLASSES)
# by default allow all classes in .names file
allowed_classes = list(class_names.values())
# custom allowed classes (uncomment line below to customize tracker for only people)
#allowed_classes = ['person']
# loop through objects and use class index to get class name, allow only classes in allowed_classes list
names = []
deleted_indx = []
for i in range(num_objects):
class_indx = int(classes[i])
class_name = class_names[class_indx]
if class_name not in allowed_classes:
deleted_indx.append(i)
else:
names.append(class_name)
names = np.array(names)
count = len(names)
if FLAGS.count:
cv2.putText(frame, "Objects being tracked: {}".format(count), (5, 35), cv2.FONT_HERSHEY_COMPLEX_SMALL, 2, (0, 255, 0), 2)
print("Objects being tracked: {}".format(count))
# delete detections that are not in allowed_classes
bboxes = np.delete(bboxes, deleted_indx, axis=0)
scores = np.delete(scores, deleted_indx, axis=0)
# encode yolo detections and feed to tracker
features = encoder(frame, bboxes)
detections = [Detection(bbox, score, class_name, feature) for bbox, score, class_name, feature in zip(bboxes, scores, names, features)]
#initialize color map
cmap = plt.get_cmap('tab20b')
colors = [cmap(i)[:3] for i in np.linspace(0, 1, 20)]
# run non-maxima supression
boxs = np.array([d.tlwh for d in detections])
scores = np.array([d.confidence for d in detections])
classes = np.array([d.class_name for d in detections])
indices = preprocessing.non_max_suppression(boxs, classes, nms_max_overlap, scores)
detections = [detections[i] for i in indices]
# Call the tracker
tracker.predict()
tracker.update(detections)
# update tracks
for track in tracker.tracks:
if not track.is_confirmed() or track.time_since_update > 1:
continue
bbox = track.to_tlbr()
class_name = track.get_class()
# draw bbox on screen
color = colors[int(track.track_id) % len(colors)]
color = [i * 255 for i in color]
cv2.rectangle(frame, (int(bbox[0]), int(bbox[1])), (int(bbox[2]), int(bbox[3])), color, 2)
cv2.rectangle(frame, (int(bbox[0]), int(bbox[1]-30)), (int(bbox[0])+(len(class_name)+len(str(track.track_id)))*17, int(bbox[1])), color, -1)
cv2.putText(frame, class_name + "-" + str(track.track_id),(int(bbox[0]), int(bbox[1]-10)),0, 0.75, (255,255,255),2)
# if enable info flag then print details about each track
if FLAGS.info:
print("Tracker ID: {}, Class: {}, BBox Coords (xmin, ymin, xmax, ymax): {}".format(str(track.track_id), class_name, (int(bbox[0]), int(bbox[1]), int(bbox[2]), int(bbox[3]))))
# calculate frames per second of running detections
fps = 1.0 / (time.time() - start_time)
print("FPS: %.2f" % fps)
result = np.asarray(frame)
result = cv2.cvtColor(frame, cv2.COLOR_RGB2BGR)
if not FLAGS.dont_show:
cv2.imshow("Output Video", result)
# if output flag is set, save video file
if FLAGS.output:
out.write(result)
if cv2.waitKey(1) & 0xFF == ord('q'): break
cv2.destroyAllWindows()
if __name__ == '__main__':
try:
app.run(main)
except SystemExit:
pass
The error occurs at the tflite area, although I posted the whole script since it might prove useful for others too. Thanks a lot for your help
Update: I have looked at this code more these days, and Ive noticed that it is made specifically for tflite models with 3 outputs/branches, while my yolov4-tiny model has 2 outputs. I will see how I can modify the above script to run my model, but still the speed ( fps ) are extremely low ( 0.15 fps with inference ). Any idea on how to fix that part?
It's only 0.15? on Coral?
Sorry, the FPS on Coral is 0.45. Still relatively low, not that big of an improvement.
HW: AMD Ryzen 7 2700X video: https://github.com/theAIGuysCode/yolov4-deepsort/blob/master/data/video/test.mp4 using only CPU
I think the computation time excluding inference is too long.
How to install scipy on Coral?
FPS: 3.09, inference: 0.13 s, compute: 0.32
FPS: 3.06, inference: 0.14 s, compute: 0.33
FPS: 3.08, inference: 0.13 s, compute: 0.32
FPS: 2.93, inference: 0.13 s, compute: 0.34
FPS: 3.06, inference: 0.13 s, compute: 0.33
FPS: 3.18, inference: 0.13 s, compute: 0.31
FPS: 2.87, inference: 0.13 s, compute: 0.35
import time
import cv2
from PIL import Image
import numpy as np
import matplotlib.pyplot as plt
from yolov4.tf import YOLOv4
from deep_sort import preprocessing, nn_matching
from deep_sort.detection import Detection
from deep_sort.tracker import Tracker
from tools import generate_detections as gdet
yolo = YOLOv4(tiny=True)
yolo.classes = "dataset/coco.names"
yolo.make_model(activation1="relu")
yolo.load_weights(
r"C:\Users\windows\google_drive\Hard_Soft\NN\yolov4\yolov4-tiny-relu.weights",
weights_type="yolo",
)
# Definition of the parameters
max_cosine_distance = 0.4
nn_budget = None
nms_max_overlap = 1.0
# initialize deep sort
model_filename = "model_data/mars-small128.pb"
encoder = gdet.create_box_encoder(model_filename, batch_size=1)
# calculate cosine distance metric
metric = nn_matching.NearestNeighborDistanceMetric(
"cosine", max_cosine_distance, nn_budget
)
# initialize tracker
tracker = Tracker(metric)
# load configuration for object detector
input_size = yolo.input_size
video_path = r"C:/Users/windows/Desktop/test.mp4"
# begin video capture
vid = cv2.VideoCapture(video_path)
out = None
# initialize color map
cmap = plt.get_cmap("tab20b")
colors = [cmap(i)[:3] for i in np.linspace(0, 1, 20)]
# while video is running
while True:
start_time = time.time()
return_value, frame = vid.read()
if return_value:
frame = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
image = Image.fromarray(frame)
else:
print("Video has ended or failed, try a different video format!")
break
original_h, original_w, _ = frame.shape
# (x, y, w, h, class_id, probability)
_bboxes = yolo.predict(frame)
mid_time = time.time()
# convert data to numpy arrays and slice out unused elements
# format bounding boxes from normalized ymin, xmin, ymax, xmax ---> xmin, ymin, width, height
num_objects = len(_bboxes)
bboxes = [
[
(box[0] - box[2] / 2) * original_w,
(box[1] - box[3] / 2) * original_h,
box[2] * original_w,
box[3] * original_h,
]
for box in _bboxes
]
bboxes = np.array(bboxes)
scores = np.array([box[5] for box in _bboxes])
classes = np.array([int(box[4]) for box in _bboxes])
# store all predictions in one parameter for simplicity when calling functions
pred_bbox = [bboxes, scores, classes, num_objects]
# read in all class names from config
class_names = yolo.classes
# by default allow all classes in .names file
# allowed_classes = list(class_names.values())
# custom allowed classes (uncomment line below to customize tracker for only people)
allowed_classes = ["person", "bicycle"]
# loop through objects and use class index to get class name, allow only classes in allowed_classes list
names = []
deleted_indx = []
for i in range(num_objects):
class_indx = classes[i]
class_name = class_names[class_indx]
if class_name not in allowed_classes:
deleted_indx.append(i)
else:
names.append(class_name)
names = np.array(names)
count = len(names)
# delete detections that are not in allowed_classes
bboxes = np.delete(bboxes, deleted_indx, axis=0)
# encode yolo detections and feed to tracker
features = encoder(frame, bboxes)
detections = [
Detection(bbox, score, class_name, feature)
for bbox, score, class_name, feature in zip(
bboxes, scores, names, features
)
]
# run non-maxima supression
boxs = np.array([d.tlwh for d in detections])
scores = np.array([d.confidence for d in detections])
classes = np.array([d.class_name for d in detections])
indices = preprocessing.non_max_suppression(
boxs, classes, nms_max_overlap, scores
)
detections = [detections[i] for i in indices]
# Call the tracker
tracker.predict()
tracker.update(detections)
# update tracks
for track in tracker.tracks:
if not track.is_confirmed() or track.time_since_update > 1:
continue
bbox = track.to_tlbr()
class_name = track.get_class()
# draw bbox on screen
color = colors[int(track.track_id) % len(colors)]
color = [i * 255 for i in color]
cv2.rectangle(
frame,
(int(bbox[0]), int(bbox[1])),
(int(bbox[2]), int(bbox[3])),
color,
2,
)
cv2.rectangle(
frame,
(int(bbox[0]), int(bbox[1] - 30)),
(
int(bbox[0])
+ (len(class_name) + len(str(track.track_id))) * 17,
int(bbox[1]),
),
color,
-1,
)
cv2.putText(
frame,
class_name + "-" + str(track.track_id),
(int(bbox[0]), int(bbox[1] - 10)),
0,
0.75,
(255, 255, 255),
2,
)
# calculate frames per second of running detections
fps = 1.0 / (time.time() - start_time)
print(
"FPS: {:.2f}, inference: {:.2f} s, compute: {:.2f}".format(
fps, mid_time - start_time, 1 / fps
)
)
result = np.asarray(frame)
result = cv2.cvtColor(frame, cv2.COLOR_RGB2BGR)
cv2.imshow("Output Video", result)
# if output flag is set, save video file
if cv2.waitKey(1) & 0xFF == ord("q"):
break
cv2.destroyAllWindows()
Hey, I might have made things unclear a bit. I first said I was trying to use the above code initially on my pc, to test how the model would run with deepsort. I could not run it, due to the model having 2 outputs instead of 3 that was at the script I've sent you.
I tested the model with the inference you provided in my pc, giving 0.15 fps and then at Coral, giving 0.45 fps. It is 3x better but still extremely low. At my PC test I used my GPU ( Gtx 1660 super ).
Sorry for the misunderstanding. As for Coral, I do not think there is a way to install scipy on it at the moment, so I might just go with kalman trackers or basic centroid tracking. What bothers me a bit though is the aforementioned low FPS issue.
I modified the script above, and I can say that it works with tflite models now, which is a positive result. The drawback is that it still has extremely low FPS, at the point that the window stops responding:
FPS: 0.03, inference: 35.98 s, compute: 36.15
FPS: 0.03, inference: 36.02 s, compute: 36.12
FPS: 0.03, inference: 36.01 s, compute: 36.13
Thanks for the script ( Tested on GPU )
Update: I have looked at this code more these days, and Ive noticed that it is made specifically for tflite models with 3 outputs/branches, while my yolov4-tiny model has 2 outputs. I will see how I can modify the above script to run my model, but still the speed ( fps ) are extremely low ( 0.15 fps with inference ). Any idea on how to fix that part?
Could you explain how to solve this problem? I got same error: ValueError: Shapes (1, 19, 19) and (1, 38, 38) are incompatible
Good evening, I have been trying using the converted model today for object detection with deepsort, without result. Before that, I tried testing it as underlined by you, using the inference command. However, when used with videos it takes a huge amount of time to change the frame and track the changes. As for deepsort, I referred to https://github.com/theAIGuysCode/yolov4-deepsort and his tracker script, only to provide the following error:
ValueError: Shapes (1, 19, 19) and (1, 38, 38) are incompatible
After that, I tried running the above script ( basically the same as hunglc007's script ) with the following correction for int8 models, as specified here https://github.com/hunglc007/tensorflow-yolov4-tflite/issues/214 ( I recall you have referenced someone at one issue at this ). I tried running it, and it got me the following error:Should I swap the number 2 with 1 or 0, it will eventually bring up an image, with an extremely inaccurate detection. I haven't tried this with video, for safety purposes :P ...
These issues and the fps are the crucial issues for me. Thank you for your great work though, the conversion is successful and the model is working.