hunglc007 / tensorflow-yolov4-tflite

YOLOv4, YOLOv4-tiny, YOLOv3, YOLOv3-tiny Implemented in Tensorflow 2.0, Android. Convert YOLO v4 .weights tensorflow, tensorrt and tflite
https://github.com/hunglc007/tensorflow-yolov4-tflite
MIT License
2.23k stars 1.24k forks source link

is there any way to improve the yolov4 train speed? #361

Closed hsji0 closed 3 years ago

hsji0 commented 3 years ago

I've implement a custom yolov4 model, and it trained well. but the problem is training speed is much slower than darknet yolov4 i figured out this is from feeding data to model by numpy (every time it generate new augmented data) so I modified data pipeline using tf.dataset as below replacing numpy input (with this, training speed is similar with darknet yolo)

` def get_dataset(self): img_dataset = tf.data.Dataset.from_tensor_slices(self._train_images) img_dataset = img_dataset.map(map_func=self._load_images, num_parallel_calls=AUTOTUNE) img_dataset = img_dataset.map(map_func=self._preprocess_images, num_parallel_calls=AUTOTUNE) lbl_dataset = tf.data.Dataset.from_tensor_slices(self._train_labels) lbl_dataset = lbl_dataset.map(map_func=self._load_labels, num_parallel_calls=AUTOTUNE)

    dataset = tf.data.Dataset.zip((img_dataset, lbl_dataset))

    # dataset = dataset.map(
    #     lambda img_dataset, lbl_dataset: tf.py_function(self.augmentation, [img_dataset, lbl_dataset], [tf.float32, tf.float32])
    # )

    # if self.data_aug:
    #     dataset = dataset.map(
    #         lambda img_dataset, lbl_dataset: self.augmentation(img_dataset, lbl_dataset), num_parallel_calls=AUTOTUNE)

    dataset = dataset.map(
        lambda img_dataset, lbl_dataset: tf.py_function(self.preprocess_true_boxes, [img_dataset, lbl_dataset], [tf.float32, tf.float32, tf.float32]),
                                                        num_parallel_calls=AUTOTUNE)
    # dataset = dataset.map(
    #     lambda img_dataset, lbl_dataset: tf.numpy_function(self.preprocess_true_boxes, [img_dataset, lbl_dataset],
    #                                                     [tf.float32, tf.float32, tf.float32]),
    #     num_parallel_calls=AUTOTUNE)

    dataset = dataset.map(map_func=self._adjust_shape,  num_parallel_calls=AUTOTUNE)

    dataset = dataset.cache("")
    dataset = dataset.shuffle(5000, reshuffle_each_iteration=True)
    dataset = dataset.repeat()
    dataset = dataset.batch(self.batch_size).prefetch(AUTOTUNE)
    return dataset

def __iter__(self):
    return self`

but result is weird. (by using original code, result was good)

what i checked:

could you tell me what maybe possible problem??
or are there any way to improve the training speed using original code?? (i compare this with darknet, training time of darknet was much faster)