zzh8829 / yolov3-tf2

YoloV3 Implemented in Tensorflow 2.0
MIT License
2.51k stars 909 forks source link

Cannot convert custom darknet model #43

Closed raulberari closed 4 years ago

raulberari commented 5 years ago

I'm trying to convert my darknet weights to tensorflow weights using the command python convert.py --weights /path/to/weights --output ./checkpoints/yolo-obj.tf

And what I get is this error message:

File "convert.py", line 33, in <module>
    app.run(main)
  File "/home/raulberari/.conda/envs/yolov3-tf2/lib/python3.6/site-packages/absl/app.py", line 300, in run
    _run_main(main, args)
  File "/home/raulberari/.conda/envs/yolov3-tf2/lib/python3.6/site-packages/absl/app.py", line 251, in _run_main
    sys.exit(main(argv))
  File "convert.py", line 20, in main
    load_darknet_weights(yolo, FLAGS.weights, FLAGS.tiny)
  File "/home/raulberari/yolov3-tf2/yolov3_tf2/utils.py", line 66, in load_darknet_weights
    conv_shape).transpose([2, 3, 1, 0])
ValueError: cannot reshape array of size 42732 into shape (256,128,3,3)

This happens after I0801 10:46:50.183817 139702532433664 utils.py:45] yolo_output_2/conv2d_73 bn

Does anyone have an explanation for this? I'm running this in the given env, yolov3-tf2 on an Ubuntu machine.

chrisrapson commented 5 years ago

I am getting a very similar error when converting one of my custom models. I think it is related to the number of classes. This repository sets the default number of classes to 80, and doesn't read the cfg file or the names file, so it doesn't seem to have any way of adjusting the model to a different number of classes.

I noticed that when I modified convert.py a bit to pass classes as an argument to yolo = YoloV3Tiny(), the error changes:

ValueError: cannot reshape array of size X into shape (256,384,3,3)

where X = 1 class : 615014 2 classes: 613475 80 classes: 493433 280 classes: 185633

I was always using the same weights file, which was created as a tiny yolo_v3 model with 1 class.

MarcBrau commented 5 years ago

Yes, it's indeed related to the default number of classes. I'm currently experiencing the same problem and I am wondering if it's possible to convert the weights in such a way that I am able to use my custom number of classes. Unfortunately, I haven't found a way yet...

zzh8829 commented 5 years ago

I never tried with custom number of classes. if you can upload a weight file, I can test if there is a way

ytolochko commented 5 years ago

Hello I get the following error when executing convert.py on the recommended pre-trained Darknet weights. I am on anaconda windows 10 if that matters.

assert len(wf.read()) == 0, 'failed to read all data' AssertionError: failed to read all data

Can you give me a hint what's wrong?

MarcBrau commented 5 years ago

Hello I get the following error when executing convert.py on the recommended pre-trained Darknet weights. I am on anaconda windows 10 if that matters. assert len(wf.read()) == 0, 'failed to read all data' AssertionError: failed to read all data Can you give me a hint what's wrong?

Hi, I think that you have changed the number of classes in your models.py, at least in the header of the YoloV3, if you change it back to 80, the conversion should work fine.

I never tried with custom number of classes. if you can upload a weight file, I can test if there is a way

Well, I think the problem here is that generating a (custom) weight file is not possible as long as you do not use the default value of 80 classes in the models.py. Trying to convert the weights with a custom number of classes results in the above-mentioned AssertionError which for my part shows that the conversion was not done properly.

chrisrapson commented 5 years ago

I never tried with custom number of classes. if you can upload a weight file, I can test if there is a way

@zzh8829 weight files are huge, so it is probably quicker to generate it yourself from a custom cfg file. The instructions are here: https://github.com/AlexeyAB/darknet#how-to-train-tiny-yolo-to-detect-your-custom-objects

Or just download my cfg file (below) and run the following:

./darknet partial cfg/yolov3-tiny-custom.cfg yolov3-tiny.weights yolov3-tiny-custom.weights 15
[net]
# Testing
#batch=1
#subdivisions=1
# Training
batch=64
subdivisions=16
width=416
height=416
channels=3
momentum=0.9
decay=0.0005
angle=10
saturation = 1.5
exposure = 1.5
hue=.1

learning_rate=0.001
burn_in=1000
max_batches = 500200
policy=steps
steps=400000,450000
scales=.1,.1

[convolutional]
batch_normalize=1
filters=16
size=3
stride=1
pad=1
activation=leaky

[maxpool]
size=2
stride=2

[convolutional]
batch_normalize=1
filters=32
size=3
stride=1
pad=1
activation=leaky

[maxpool]
size=2
stride=2

[convolutional]
batch_normalize=1
filters=64
size=3
stride=1
pad=1
activation=leaky

[maxpool]
size=2
stride=2

[convolutional]
batch_normalize=1
filters=128
size=3
stride=1
pad=1
activation=leaky

[maxpool]
size=2
stride=2

[convolutional]
batch_normalize=1
filters=256
size=3
stride=1
pad=1
activation=leaky

[maxpool]
size=2
stride=2

[convolutional]
batch_normalize=1
filters=512
size=3
stride=1
pad=1
activation=leaky

[maxpool]
size=2
stride=1

[convolutional]
batch_normalize=1
filters=1024
size=3
stride=1
pad=1
activation=leaky

###########

[convolutional]
batch_normalize=1
filters=256
size=1
stride=1
pad=1
activation=leaky

[convolutional]
batch_normalize=1
filters=512
size=3
stride=1
pad=1
activation=leaky

[convolutional]
size=1
stride=1
pad=1
filters=18
activation=linear

[yolo]
mask = 3,4,5
anchors = 10,14,  23,27,  37,58,  81,82,  135,169,  344,319
classes=1
num=6
jitter=.3
ignore_thresh = .7
truth_thresh = 1
random=1

[route]
layers = -4

[convolutional]
batch_normalize=1
filters=21
size=1
stride=1
pad=1
activation=leaky

[upsample]
stride=2

[route]
layers = -1, 8

[convolutional]
batch_normalize=1
filters=256
size=3
stride=1
pad=1
activation=leaky

[convolutional]
size=1
stride=1
pad=1
filters=18
activation=linear

[yolo]
mask = 0,1,2
anchors = 10,14,  23,27,  37,58,  81,82,  135,169,  344,319
classes=1
num=6
jitter=.3
ignore_thresh = .7
truth_thresh = 1
random=1
hugovaz commented 5 years ago

I was able to convert a custom weights file (2 classes). As mentioned by others, the models.py has the classes set to 80 throughout, and me being lazy I decided to search and replace "classes=80" to "classes=2" so I didn't have to trace the whole code just to test a theory... and it worked, for both models (standard V3 and tiny). The three affected methods in models.py are:

Knowing that, it should be just a matter of changing the convert, detect and train scripts to pass the number of classes, and trace models.py to check what else it'll need to be changed to propagate the value (if anything at all). I think I got it all done, convert.py and detect.py are working as intended, just didn't test train.py.

Basically what I did was add a flag for _classesnum, to pass the number of classes, on those 3 scripts. If no value is passed, it defaults to 80 (redundant because the method already does it, but anyway).

convert.py

from absl import app, flags, logging
from absl.flags import FLAGS
import numpy as np
from yolov3_tf2.models import YoloV3, YoloV3Tiny
from yolov3_tf2.utils import load_darknet_weights

flags.DEFINE_string('weights', './data/yolov3.weights', 'path to weights file')
flags.DEFINE_string('output', './checkpoints/yolov3.tf', 'path to output')
flags.DEFINE_boolean('tiny', False, 'yolov3 or yolov3-tiny')
flags.DEFINE_integer('classes_num', 80, 'Number of classes in the model')

def main(_argv):
    if FLAGS.tiny:
        yolo = YoloV3Tiny(classes=FLAGS.classes_num)
    else:
        yolo = YoloV3(classes=FLAGS.classes_num)
    yolo.summary()
    logging.info('model created')

    load_darknet_weights(yolo, FLAGS.weights, FLAGS.tiny)
    logging.info('weights loaded')

    img = np.random.random((1, 320, 320, 3)).astype(np.float32)
    output = yolo(img)
    logging.info('sanity check passed')

    yolo.save_weights(FLAGS.output)
    logging.info('weights saved')

if __name__ == '__main__':
    try:
        app.run(main)
    except SystemExit:
        pass

detect.py

import time
from absl import app, flags, logging
from absl.flags import FLAGS
import cv2
import numpy as np
import tensorflow as tf
from yolov3_tf2.models import (
    YoloV3, YoloV3Tiny
)
from yolov3_tf2.dataset import transform_images
from yolov3_tf2.utils import draw_outputs

flags.DEFINE_string('classes', './data/coco.names', 'path to classes file')
flags.DEFINE_string('weights', './checkpoints/yolov3.tf',
                    'path to weights file')
flags.DEFINE_boolean('tiny', False, 'yolov3 or yolov3-tiny')
flags.DEFINE_integer('size', 416, 'resize images to')
flags.DEFINE_string('image', './data/girl.png', 'path to input image')
flags.DEFINE_string('output', './output.jpg', 'path to output image')
flags.DEFINE_integer('classes_num', 80, 'Number of classes in the model')

def main(_argv):
    if FLAGS.tiny:
        yolo = YoloV3Tiny(classes=FLAGS.classes_num)
    else:
        yolo = YoloV3(classes=FLAGS.classes_num)

    yolo.load_weights(FLAGS.weights)
    logging.info('weights loaded')

    class_names = [c.strip() for c in open(FLAGS.classes).readlines()]
    logging.info('classes loaded')

    img = tf.image.decode_image(open(FLAGS.image, 'rb').read(), channels=3)
    img = tf.expand_dims(img, 0)
    img = transform_images(img, FLAGS.size)

    t1 = time.time()
    boxes, scores, classes, nums = yolo(img)
    t2 = time.time()
    logging.info('time: {}'.format(t2 - t1))

    logging.info('detections:')
    for i in range(nums[0]):
        logging.info('\t{}, {}, {}'.format(class_names[int(classes[0][i])],
                                           np.array(scores[0][i]),
                                           np.array(boxes[0][i])))

    img = cv2.imread(FLAGS.image)
    img = draw_outputs(img, (boxes, scores, classes, nums), class_names)
    cv2.imwrite(FLAGS.output, img)
    logging.info('output saved to: {}'.format(FLAGS.output))

if __name__ == '__main__':
    try:
        app.run(main)
    except SystemExit:
        pass

train.py

from absl import app, flags, logging
from absl.flags import FLAGS
import tensorflow as tf
import numpy as np
import cv2
from tensorflow.keras.callbacks import (
    ReduceLROnPlateau,
    EarlyStopping,
    ModelCheckpoint,
    TensorBoard
)
from yolov3_tf2.models import (
    YoloV3, YoloV3Tiny, YoloLoss,
    yolo_anchors, yolo_anchor_masks,
    yolo_tiny_anchors, yolo_tiny_anchor_masks
)
from yolov3_tf2.utils import freeze_all
import yolov3_tf2.dataset as dataset

flags.DEFINE_string('dataset', '', 'path to dataset')
flags.DEFINE_string('val_dataset', '', 'path to validation dataset')
flags.DEFINE_boolean('tiny', False, 'yolov3 or yolov3-tiny')
flags.DEFINE_string('weights', './checkpoints/yolov3.tf',
                    'path to weights file')
flags.DEFINE_string('classes', './data/coco.names', 'path to classes file')
flags.DEFINE_enum('mode', 'fit', ['fit', 'eager_fit', 'eager_tf'],
                  'fit: model.fit, '
                  'eager_fit: model.fit(run_eagerly=True), '
                  'eager_tf: custom GradientTape')
flags.DEFINE_enum('transfer', 'none',
                  ['none', 'darknet', 'no_output', 'frozen', 'fine_tune'],
                  'none: Training from scratch, '
                  'darknet: Transfer darknet, '
                  'no_output: Transfer all but output, '
                  'frozen: Transfer and freeze all, '
                  'fine_tune: Transfer all and freeze darknet only')
flags.DEFINE_integer('size', 416, 'image size')
flags.DEFINE_integer('epochs', 2, 'number of epochs')
flags.DEFINE_integer('batch_size', 8, 'batch size')
flags.DEFINE_float('learning_rate', 1e-3, 'learning rate')
flags.DEFINE_integer('classes_num', 80, 'Number of classes to train')

def main(_argv):
    if FLAGS.tiny:
        model = YoloV3Tiny(FLAGS.size, training=True, classes=FLAGS.classes_num)
        anchors = yolo_tiny_anchors
        anchor_masks = yolo_tiny_anchor_masks
    else:
        model = YoloV3(FLAGS.size, training=True, classes=FLAGS.classes_num)
        anchors = yolo_anchors
        anchor_masks = yolo_anchor_masks

    train_dataset = dataset.load_fake_dataset()
    if FLAGS.dataset:
        train_dataset = dataset.load_tfrecord_dataset(
            FLAGS.dataset, FLAGS.classes)
    train_dataset = train_dataset.shuffle(buffer_size=1024)  # TODO: not 1024
    train_dataset = train_dataset.batch(FLAGS.batch_size)
    train_dataset = train_dataset.map(lambda x, y: (
        dataset.transform_images(x, FLAGS.size),
        dataset.transform_targets(y, anchors, anchor_masks, 80)))
    train_dataset = train_dataset.prefetch(
        buffer_size=tf.data.experimental.AUTOTUNE)

    val_dataset = dataset.load_fake_dataset()
    if FLAGS.val_dataset:
        val_dataset = dataset.load_tfrecord_dataset(
            FLAGS.val_dataset, FLAGS.classes)
    val_dataset = val_dataset.batch(FLAGS.batch_size)
    val_dataset = val_dataset.map(lambda x, y: (
        dataset.transform_images(x, FLAGS.size),
        dataset.transform_targets(y, anchors, anchor_masks, 80)))

    if FLAGS.transfer != 'none':
        model.load_weights(FLAGS.weights)
        if FLAGS.transfer == 'fine_tune':
            # freeze darknet
            darknet = model.get_layer('yolo_darknet')
            freeze_all(darknet)
        elif FLAGS.mode == 'frozen':
            # freeze everything
            freeze_all(model)
        else:
            # reset top layers
            if FLAGS.tiny:  # get initial weights
                init_model = YoloV3Tiny(FLAGS.size, training=True, classes=FLAGS.classes_num)
            else:
                init_model = YoloV3(FLAGS.size, training=True, classes=FLAGS.classes_num)

            if FLAGS.transfer == 'darknet':
                for l in model.layers:
                    if l.name != 'yolo_darknet' and l.name.startswith('yolo_'):
                        l.set_weights(init_model.get_layer(
                            l.name).get_weights())
                    else:
                        freeze_all(l)
            elif FLAGS.transfer == 'no_output':
                for l in model.layers:
                    if l.name.startswith('yolo_output'):
                        l.set_weights(init_model.get_layer(
                            l.name).get_weights())
                    else:
                        freeze_all(l)

    optimizer = tf.keras.optimizers.Adam(lr=FLAGS.learning_rate)
    loss = [YoloLoss(anchors[mask], classes=FLAGS.classes_num) for mask in anchor_masks]

    if FLAGS.mode == 'eager_tf':
        # Eager mode is great for debugging
        # Non eager graph mode is recommended for real training
        avg_loss = tf.keras.metrics.Mean('loss', dtype=tf.float32)
        avg_val_loss = tf.keras.metrics.Mean('val_loss', dtype=tf.float32)

        for epoch in range(1, FLAGS.epochs + 1):
            for batch, (images, labels) in enumerate(train_dataset):
                with tf.GradientTape() as tape:
                    outputs = model(images, training=True)
                    regularization_loss = tf.reduce_sum(model.losses)
                    pred_loss = []
                    for output, label, loss_fn in zip(outputs, labels, loss):
                        pred_loss.append(loss_fn(label, output))
                    total_loss = tf.reduce_sum(pred_loss) + regularization_loss

                grads = tape.gradient(total_loss, model.trainable_variables)
                optimizer.apply_gradients(
                    zip(grads, model.trainable_variables))

                logging.info("{}_train_{}, {}, {}".format(
                    epoch, batch, total_loss.numpy(),
                    list(map(lambda x: np.sum(x.numpy()), pred_loss))))
                avg_loss.update_state(total_loss)

            for batch, (images, labels) in enumerate(val_dataset):
                outputs = model(images)
                regularization_loss = tf.reduce_sum(model.losses)
                pred_loss = []
                for output, label, loss_fn in zip(outputs, labels, loss):
                    pred_loss.append(loss_fn(label, output))
                total_loss = tf.reduce_sum(pred_loss) + regularization_loss

                logging.info("{}_val_{}, {}, {}".format(
                    epoch, batch, total_loss.numpy(),
                    list(map(lambda x: np.sum(x.numpy()), pred_loss))))
                avg_val_loss.update_state(total_loss)

            logging.info("{}, train: {}, val: {}".format(
                epoch,
                avg_loss.result().numpy(),
                avg_val_loss.result().numpy()))

            avg_loss.reset_states()
            avg_val_loss.reset_states()
            model.save_weights(
                'checkpoints/yolov3_train_{}.tf'.format(epoch))
    else:
        model.compile(optimizer=optimizer, loss=loss,
                      run_eagerly=(FLAGS.mode == 'eager_fit'))

        callbacks = [
            ReduceLROnPlateau(verbose=1),
            EarlyStopping(patience=3, verbose=1),
            ModelCheckpoint('checkpoints/yolov3_train_{epoch}.tf',
                            verbose=1, save_weights_only=True),
            TensorBoard(log_dir='logs')
        ]

        history = model.fit(train_dataset,
                            epochs=FLAGS.epochs,
                            callbacks=callbacks,
                            validation_data=val_dataset)

if __name__ == '__main__':
    try:
        app.run(main)
    except SystemExit:
        pass

EDIT: Just so there's no confusion on how to use it, when you call any of the scripts, just use the flag "--classes_num" with the number of classes your model has. So if you have 2 classes and want to convert, just append "--classes_num 2" to the command, and likewise when you're running detection append "--classes_num 2". One thing to mention is that on detection it does read the names file, so keep that in mind.

EDIT 2: Forgot _exporttfserving.py, so here it goes:

export_tfserving.py:

import time
from absl import app, flags, logging
from absl.flags import FLAGS
import cv2
import numpy as np
import tensorflow as tf
from yolov3_tf2.models import (
    YoloV3, YoloV3Tiny
)
from yolov3_tf2.dataset import transform_images

from tensorflow.python.eager import def_function
from tensorflow.python.framework import tensor_spec
from tensorflow.python.util import nest

flags.DEFINE_string('weights', './checkpoints/yolov3.tf',
                    'path to weights file')
flags.DEFINE_boolean('tiny', False, 'yolov3 or yolov3-tiny')
flags.DEFINE_string('output', './serving/yolov3/1', 'path to saved_model')
flags.DEFINE_string('classes', './data/coco.names', 'path to classes file')
flags.DEFINE_string('image', './data/girl.png', 'path to input image')
flags.DEFINE_integer('classes_num', 80, 'Number of classes in the model')

# TODO: remove this after upstream fix
# modified from: tensorflow.python.keras.saving.saving_utils.trace_model_call
def trace_model_call(model):
    inputs = model.inputs
    input_names = model.input_names

    input_signature = []
    for input_tensor, input_name in zip(inputs, input_names):
        input_signature.append(tensor_spec.TensorSpec(
            shape=input_tensor.shape, dtype=input_tensor.dtype,
            name=input_name))

    @def_function.function(input_signature=input_signature, autograph=False)
    def _wrapped_model(*args):
        inputs = args[0] if len(input_signature) == 1 else list(args)
        outputs_list = nest.flatten(model(inputs=inputs))
        output_names = model.output_names
        return {"{}_{}".format(kv[0], i): kv[1] for i, kv in enumerate(
            zip(output_names, outputs_list))}

    return _wrapped_model

def main(_argv):
    if FLAGS.tiny:
        yolo = YoloV3Tiny(classes=FLAGS.classes_num)
    else:
        yolo = YoloV3(classes=FLAGS.classes_num)

    yolo.load_weights(FLAGS.weights)
    logging.info('weights loaded')

    tf.saved_model.save(yolo, FLAGS.output, signatures=trace_model_call(yolo))
    logging.info("model saved to: {}".format(FLAGS.output))

    model = tf.saved_model.load(FLAGS.output)
    infer = model.signatures[tf.saved_model.DEFAULT_SERVING_SIGNATURE_DEF_KEY]
    logging.info(infer.structured_outputs)

    class_names = [c.strip() for c in open(FLAGS.classes).readlines()]
    logging.info('classes loaded')

    img = tf.image.decode_image(open(FLAGS.image, 'rb').read(), channels=3)
    img = tf.expand_dims(img, 0)
    img = transform_images(img, 416)

    t1 = time.time()
    outputs = infer(img)
    boxes, scores, classes, nums = outputs["yolo_nms_0"], outputs[
        "yolo_nms_1"], outputs["yolo_nms_2"], outputs["yolo_nms_3"]
    t2 = time.time()
    logging.info('time: {}'.format(t2 - t1))

    logging.info('detections:')
    for i in range(nums[0]):
        logging.info('\t{}, {}, {}'.format(class_names[int(classes[0][i])],
                                           scores[0][i].numpy(),
                                           boxes[0][i].numpy()))

if __name__ == '__main__':
    try:
        app.run(main)
    except SystemExit:
        pass

EDIT: August 12th 2019 - corrected the pasted convert.py that had FLAGS.classes instead of _FLAGS.classesnum.

MarcBrau commented 5 years ago

I was able to convert a custom weights file (2 classes). As mentioned by others, the models.py has the classes set to 80 throughout, and me being lazy I decided to search and replace "classes=80" to "classes=2" so I didn't have to trace the whole code just to test a theory... and it worked, for both models (standard V3 and tiny). The three affected methods in models.py are:

YoloV3 YoloV3Tiny YoloLoss

Knowing that, it should be just a matter of changing the convert, detect and train scripts to pass the number of classes, and trace models.py to check what else it'll need to be changed to propagate the value (if anything at all). I think I got it all done, convert.py and detect.py are working as intended, just didn't test train.py. Basically what I did was add a flag for classes_num, to pass the number of classes, on those 3 scripts. If no value is passed, it defaults to 80 (redundant because the method already does it, but anyway). convert.py from absl import app, flags, logging from absl.flags import FLAGS import numpy as np from yolov3_tf2.models import YoloV3, YoloV3Tiny from yolov3_tf2.utils import load_darknet_weights

flags.DEFINE_string('weights', './data/yolov3.weights', 'path to weights file') flags.DEFINE_string('output', './checkpoints/yolov3.tf', 'path to output') flags.DEFINE_boolean('tiny', False, 'yolov3 or yolov3-tiny') flags.DEFINE_integer('classes_num', 80, 'Number of classes in the model')

def main(_argv): if FLAGS.tiny: yolo = YoloV3Tiny(classes=FLAGS.classes) else: yolo = YoloV3(classes=FLAGS.classes) yolo.summary() logging.info('model created')

load_darknet_weights(yolo, FLAGS.weights, FLAGS.tiny)
logging.info('weights loaded')

img = np.random.random((1, 320, 320, 3)).astype(np.float32)
output = yolo(img)
logging.info('sanity check passed')

yolo.save_weights(FLAGS.output)
logging.info('weights saved')

if name == 'main': try: app.run(main) except SystemExit: pass

detect.py import time from absl import app, flags, logging from absl.flags import FLAGS import cv2 import numpy as np import tensorflow as tf from yolov3_tf2.models import ( YoloV3, YoloV3Tiny ) from yolov3_tf2.dataset import transform_images from yolov3_tf2.utils import draw_outputs

flags.DEFINE_string('classes', './data/coco.names', 'path to classes file') flags.DEFINE_string('weights', './checkpoints/yolov3.tf', 'path to weights file') flags.DEFINE_boolean('tiny', False, 'yolov3 or yolov3-tiny') flags.DEFINE_integer('size', 416, 'resize images to') flags.DEFINE_string('image', './data/girl.png', 'path to input image') flags.DEFINE_string('output', './output.jpg', 'path to output image') flags.DEFINE_integer('classes_num', 80, 'Number of classes in the model')

def main(_argv): if FLAGS.tiny: yolo = YoloV3Tiny(classes=FLAGS.classes_num) else: yolo = YoloV3(classes=FLAGS.classes_num)

yolo.load_weights(FLAGS.weights)
logging.info('weights loaded')

class_names = [c.strip() for c in open(FLAGS.classes).readlines()]
logging.info('classes loaded')

img = tf.image.decode_image(open(FLAGS.image, 'rb').read(), channels=3)
img = tf.expand_dims(img, 0)
img = transform_images(img, FLAGS.size)

t1 = time.time()
boxes, scores, classes, nums = yolo(img)
t2 = time.time()
logging.info('time: {}'.format(t2 - t1))

logging.info('detections:')
for i in range(nums[0]):
    logging.info('\t{}, {}, {}'.format(class_names[int(classes[0][i])],
                                       np.array(scores[0][i]),
                                       np.array(boxes[0][i])))

img = cv2.imread(FLAGS.image)
img = draw_outputs(img, (boxes, scores, classes, nums), class_names)
cv2.imwrite(FLAGS.output, img)
logging.info('output saved to: {}'.format(FLAGS.output))

if name == 'main': try: app.run(main) except SystemExit: pass

train.py from absl import app, flags, logging from absl.flags import FLAGS import tensorflow as tf import numpy as np import cv2 from tensorflow.keras.callbacks import ( ReduceLROnPlateau, EarlyStopping, ModelCheckpoint, TensorBoard ) from yolov3_tf2.models import ( YoloV3, YoloV3Tiny, YoloLoss, yolo_anchors, yolo_anchor_masks, yolo_tiny_anchors, yolo_tiny_anchor_masks ) from yolov3_tf2.utils import freeze_all import yolov3_tf2.dataset as dataset

flags.DEFINE_string('dataset', '', 'path to dataset') flags.DEFINE_string('val_dataset', '', 'path to validation dataset') flags.DEFINE_boolean('tiny', False, 'yolov3 or yolov3-tiny') flags.DEFINE_string('weights', './checkpoints/yolov3.tf', 'path to weights file') flags.DEFINE_string('classes', './data/coco.names', 'path to classes file') flags.DEFINE_enum('mode', 'fit', ['fit', 'eager_fit', 'eager_tf'], 'fit: model.fit, ' 'eager_fit: model.fit(run_eagerly=True), ' 'eager_tf: custom GradientTape') flags.DEFINE_enum('transfer', 'none', ['none', 'darknet', 'no_output', 'frozen', 'fine_tune'], 'none: Training from scratch, ' 'darknet: Transfer darknet, ' 'no_output: Transfer all but output, ' 'frozen: Transfer and freeze all, ' 'fine_tune: Transfer all and freeze darknet only') flags.DEFINE_integer('size', 416, 'image size') flags.DEFINE_integer('epochs', 2, 'number of epochs') flags.DEFINE_integer('batch_size', 8, 'batch size') flags.DEFINE_float('learning_rate', 1e-3, 'learning rate') flags.DEFINE_integer('classes_num', 80, 'Number of classes to train')

def main(_argv): if FLAGS.tiny: model = YoloV3Tiny(FLAGS.size, training=True, classes=FLAGS.classes_num) anchors = yolo_tiny_anchors anchor_masks = yolo_tiny_anchor_masks else: model = YoloV3(FLAGS.size, training=True, classes=FLAGS.classes_num) anchors = yolo_anchors anchor_masks = yolo_anchor_masks

train_dataset = dataset.load_fake_dataset()
if FLAGS.dataset:
    train_dataset = dataset.load_tfrecord_dataset(
        FLAGS.dataset, FLAGS.classes)
train_dataset = train_dataset.shuffle(buffer_size=1024)  # TODO: not 1024
train_dataset = train_dataset.batch(FLAGS.batch_size)
train_dataset = train_dataset.map(lambda x, y: (
    dataset.transform_images(x, FLAGS.size),
    dataset.transform_targets(y, anchors, anchor_masks, 80)))
train_dataset = train_dataset.prefetch(
    buffer_size=tf.data.experimental.AUTOTUNE)

val_dataset = dataset.load_fake_dataset()
if FLAGS.val_dataset:
    val_dataset = dataset.load_tfrecord_dataset(
        FLAGS.val_dataset, FLAGS.classes)
val_dataset = val_dataset.batch(FLAGS.batch_size)
val_dataset = val_dataset.map(lambda x, y: (
    dataset.transform_images(x, FLAGS.size),
    dataset.transform_targets(y, anchors, anchor_masks, 80)))

if FLAGS.transfer != 'none':
    model.load_weights(FLAGS.weights)
    if FLAGS.transfer == 'fine_tune':
        # freeze darknet
        darknet = model.get_layer('yolo_darknet')
        freeze_all(darknet)
    elif FLAGS.mode == 'frozen':
        # freeze everything
        freeze_all(model)
    else:
        # reset top layers
        if FLAGS.tiny:  # get initial weights
            init_model = YoloV3Tiny(FLAGS.size, training=True, classes=FLAGS.classes_num)
        else:
            init_model = YoloV3(FLAGS.size, training=True, classes=FLAGS.classes_num)

        if FLAGS.transfer == 'darknet':
            for l in model.layers:
                if l.name != 'yolo_darknet' and l.name.startswith('yolo_'):
                    l.set_weights(init_model.get_layer(
                        l.name).get_weights())
                else:
                    freeze_all(l)
        elif FLAGS.transfer == 'no_output':
            for l in model.layers:
                if l.name.startswith('yolo_output'):
                    l.set_weights(init_model.get_layer(
                        l.name).get_weights())
                else:
                    freeze_all(l)

optimizer = tf.keras.optimizers.Adam(lr=FLAGS.learning_rate)
loss = [YoloLoss(anchors[mask], classes=FLAGS.classes_num) for mask in anchor_masks]

if FLAGS.mode == 'eager_tf':
    # Eager mode is great for debugging
    # Non eager graph mode is recommended for real training
    avg_loss = tf.keras.metrics.Mean('loss', dtype=tf.float32)
    avg_val_loss = tf.keras.metrics.Mean('val_loss', dtype=tf.float32)

    for epoch in range(1, FLAGS.epochs + 1):
        for batch, (images, labels) in enumerate(train_dataset):
            with tf.GradientTape() as tape:
                outputs = model(images, training=True)
                regularization_loss = tf.reduce_sum(model.losses)
                pred_loss = []
                for output, label, loss_fn in zip(outputs, labels, loss):
                    pred_loss.append(loss_fn(label, output))
                total_loss = tf.reduce_sum(pred_loss) + regularization_loss

            grads = tape.gradient(total_loss, model.trainable_variables)
            optimizer.apply_gradients(
                zip(grads, model.trainable_variables))

            logging.info("{}_train_{}, {}, {}".format(
                epoch, batch, total_loss.numpy(),
                list(map(lambda x: np.sum(x.numpy()), pred_loss))))
            avg_loss.update_state(total_loss)

        for batch, (images, labels) in enumerate(val_dataset):
            outputs = model(images)
            regularization_loss = tf.reduce_sum(model.losses)
            pred_loss = []
            for output, label, loss_fn in zip(outputs, labels, loss):
                pred_loss.append(loss_fn(label, output))
            total_loss = tf.reduce_sum(pred_loss) + regularization_loss

            logging.info("{}_val_{}, {}, {}".format(
                epoch, batch, total_loss.numpy(),
                list(map(lambda x: np.sum(x.numpy()), pred_loss))))
            avg_val_loss.update_state(total_loss)

        logging.info("{}, train: {}, val: {}".format(
            epoch,
            avg_loss.result().numpy(),
            avg_val_loss.result().numpy()))

        avg_loss.reset_states()
        avg_val_loss.reset_states()
        model.save_weights(
            'checkpoints/yolov3_train_{}.tf'.format(epoch))
else:
    model.compile(optimizer=optimizer, loss=loss,
                  run_eagerly=(FLAGS.mode == 'eager_fit'))

    callbacks = [
        ReduceLROnPlateau(verbose=1),
        EarlyStopping(patience=3, verbose=1),
        ModelCheckpoint('checkpoints/yolov3_train_{epoch}.tf',
                        verbose=1, save_weights_only=True),
        TensorBoard(log_dir='logs')
    ]

    history = model.fit(train_dataset,
                        epochs=FLAGS.epochs,
                        callbacks=callbacks,
                        validation_data=val_dataset)

if name == 'main': try: app.run(main) except SystemExit: pass

EDIT: Just so there's no confusion on how to use it, when you call any of the scripts, just use the flag "--classes_num" with the number of classes your model has. So if you have 2 classes and want to convert, just append "--classes_num 2" to the command, and likewise when you're running detection append "--classes_num 2". One thing to mention is that on detection it does read the names file, so keep that in mind. EDIT 2: Forgot export_tfserving.py, so here it goes: export_tfserving.py: import time from absl import app, flags, logging from absl.flags import FLAGS import cv2 import numpy as np import tensorflow as tf from yolov3_tf2.models import ( YoloV3, YoloV3Tiny ) from yolov3_tf2.dataset import transform_images

from tensorflow.python.eager import def_function from tensorflow.python.framework import tensor_spec from tensorflow.python.util import nest

flags.DEFINE_string('weights', './checkpoints/yolov3.tf', 'path to weights file') flags.DEFINE_boolean('tiny', False, 'yolov3 or yolov3-tiny') flags.DEFINE_string('output', './serving/yolov3/1', 'path to saved_model') flags.DEFINE_string('classes', './data/coco.names', 'path to classes file') flags.DEFINE_string('image', './data/girl.png', 'path to input image') flags.DEFINE_integer('classes_num', 80, 'Number of classes in the model')

TODO: remove this after upstream fix

modified from: tensorflow.python.keras.saving.saving_utils.trace_model_call

def trace_model_call(model): inputs = model.inputs input_names = model.input_names

input_signature = []
for input_tensor, input_name in zip(inputs, input_names):
    input_signature.append(tensor_spec.TensorSpec(
        shape=input_tensor.shape, dtype=input_tensor.dtype,
        name=input_name))

@def_function.function(input_signature=input_signature, autograph=False)
def _wrapped_model(*args):
    inputs = args[0] if len(input_signature) == 1 else list(args)
    outputs_list = nest.flatten(model(inputs=inputs))
    output_names = model.output_names
    return {"{}_{}".format(kv[0], i): kv[1] for i, kv in enumerate(
        zip(output_names, outputs_list))}

return _wrapped_model

def main(_argv): if FLAGS.tiny: yolo = YoloV3Tiny(classes=FLAGS.classes_num) else: yolo = YoloV3(classes=FLAGS.classes_num)

yolo.load_weights(FLAGS.weights)
logging.info('weights loaded')

tf.saved_model.save(yolo, FLAGS.output, signatures=trace_model_call(yolo))
logging.info("model saved to: {}".format(FLAGS.output))

model = tf.saved_model.load(FLAGS.output)
infer = model.signatures[tf.saved_model.DEFAULT_SERVING_SIGNATURE_DEF_KEY]
logging.info(infer.structured_outputs)

class_names = [c.strip() for c in open(FLAGS.classes).readlines()]
logging.info('classes loaded')

img = tf.image.decode_image(open(FLAGS.image, 'rb').read(), channels=3)
img = tf.expand_dims(img, 0)
img = transform_images(img, 416)

t1 = time.time()
outputs = infer(img)
boxes, scores, classes, nums = outputs["yolo_nms_0"], outputs[
    "yolo_nms_1"], outputs["yolo_nms_2"], outputs["yolo_nms_3"]
t2 = time.time()
logging.info('time: {}'.format(t2 - t1))

logging.info('detections:')
for i in range(nums[0]):
    logging.info('\t{}, {}, {}'.format(class_names[int(classes[0][i])],
                                       scores[0][i].numpy(),
                                       boxes[0][i].numpy()))

if name == 'main': try: app.run(main) except SystemExit: pass

Hi there, it would be very nice if you can also upload your utils.py as the load_darknet_weights()-function defined there is, in my opinion, the critical point of this issue. For my part, I also added a flag concerning the number of classes, but when executing convert.py the assertion in load_darknet_weights() at line 74 failed and I the converted weights seem to be suboptimal. Thanks in advance

hugovaz commented 5 years ago

Hi there, it would be very nice if you can also upload your utils.py as the load_darknet_weights()-function defined there is, in my opinion, the critical point of this issue. For my part, I also added a flag concerning the number of classes, but when executing convert.py the assertion in load_darknet_weights() at line 74 failed and I the converted weights seem to be suboptimal. Thanks in advance

@MarBra110 I didn't change utils.py, it's the same as the one in the repository.

MarcBrau commented 5 years ago

Hi there, it would be very nice if you can also upload your utils.py as the load_darknet_weights()-function defined there is, in my opinion, the critical point of this issue. For my part, I also added a flag concerning the number of classes, but when executing convert.py the assertion in load_darknet_weights() at line 74 failed and I the converted weights seem to be suboptimal. Thanks in advance

@MarBra110 I didn't change utils.py, it's the same as the one in the repository.

That's pretty weird when I try to execute your convert.py, I still get the failed assertion mentioning that it failed to read all data. Are you sure you didn't change it, maybe just commented out the assertion? Furthermore, I think you got mixed up with the naming of the flag in convert.py, it should be classes_num in lines 18 and 20.

hugovaz commented 5 years ago

Hi there, it would be very nice if you can also upload your utils.py as the load_darknet_weights()-function defined there is, in my opinion, the critical point of this issue. For my part, I also added a flag concerning the number of classes, but when executing convert.py the assertion in load_darknet_weights() at line 74 failed and I the converted weights seem to be suboptimal. Thanks in advance @MarBra110 I didn't change utils.py, it's the same as the one in the repository.

That's pretty weird when I try to execute your convert.py, I still get the failed assertion mentioning that it failed to read all data. Are you sure you didn't change it, maybe just commented out the assertion? Furthermore, I think you got mixed up with the naming of the flag in convert.py, it should be classes_num in lines 18 and 20.

@MarBra110 You're right regarding the convert.py and the flag name. I started by calling it "classes" only, and then noticed there was already one flag with that name on detect.py and train.py (it's the one to specify the names file), so for coherency sake I changed on convert.py as well. Since I had already pasted I changed in here... big mistake. Thanks for catching it.

Regarding your error... are you using spp (spatial pyramid pooling) or standard Yolo V3 model? If you're using spp, could be that (if not mistaken, adds an extra layer that's not defined on the model here). Apart from that, I have no idea why it doesn't work for you. I tried with both tiny and standard V3 custom dataset, two classes; tried convert (obviously), detect image (didn't do detect video) and even tf serving (exporting and then detecting) and it went all ok. Also did some negative tests (like testing 80 classes for the 2 class one confirm it returned the error I was getting, etc). Even when I was getting error trying to convert the error was about not being able to reshape (the same that @chrisrapson was getting), not the assertion one.

MarcBrau commented 5 years ago

@hugovaz I'm just using the plain Yolov3 model given in the repository. I just doublechecked if I made some stupid mistakes in my own code by trying your convert.py implementation on a clean clone of the repo but I'm still getting the above-mentioned assertion. Could it be some os-related error? May I ask what you're running on?

hugovaz commented 5 years ago

@hugovaz I'm just using the plain Yolov3 model given in the repository. I just doublechecked if I made some stupid mistakes in my own code by trying your convert.py implementation on a clean clone of the repo but I'm still getting the above-mentioned assertion. Could it be some os-related error? May I ask what you're running on?

@MarBra110 Running Ubuntu 18.04.03 (linux), and my python environment is as follows:

absl-py==0.7.1
astor==0.8.0
certifi==2019.6.16
chardet==3.0.4
gast==0.2.2
get==2019.4.13
google-pasta==0.1.7
grpcio==1.22.0
h5py==2.9.0
idna==2.8
Keras-Applications==1.0.8
Keras-Preprocessing==1.1.0
Markdown==3.1.1
numpy==1.16.4
opencv-python==4.0.0.21
Pillow==6.1.0
post==2019.4.13
protobuf==3.9.0
public==2019.4.13
query-string==2019.4.13
requests==2.22.0
six==1.12.0
tb-nightly==1.14.0a20190301
tensorflow-gpu==2.0.0a0
termcolor==1.1.0
tf-estimator-nightly==1.14.0.dev2019030115
urllib3==1.25.3
Werkzeug==0.15.5
SlavaKeshkov commented 5 years ago

@zzh8829 Would you be able to update the repo to include the classes_num flag for custom conversion?

MarcBrau commented 5 years ago

@hugovaz Thanks for sharing, the only main difference is the os as I am running on Windows 10. However, I also tried to run it on the linux subsystem and still got the same failed assertion so I do not believe that it is an os-related error. Unfortunately, I am running out of possible error sources...are you really sure you did not change something else that in some way may affect the convert.py? Am I right that you did not change anything in models.py?

hugovaz commented 5 years ago

@hugovaz Thanks for sharing, the only main difference is the os as I am running on Windows 10. However, I also tried to run it on the linux subsystem and still got the same failed assertion so I do not believe that it is an os-related error.

WSL isn't exactly the same as running Linux, still uses windows kernel (WSL 2 is another story). May or may not be sufficient change to be impactful (it has, on some occasions, being impactful on some of my projects when using WSL).

Unfortunately, I am running out of possible error sources...are you really sure you did not change something else that in some way may affect the convert.py? Am I right that you did not change anything in models.py?

Yeah, didn't change models.py. Only files I changed were the ones I pasted here.

I wonder if your problem has anything to do with your weights file itself. You were able to convert the default one from darknet, right? Can you debug the convert and check what you get on the "wf" when it reaches line 74 of utils.py, and what you get from 27 and 52 (all instances wf is read)? I just took a quick look at it, and that assertion is to check if the weights file is empty or when read returns empty, so it either the weight file is corrupt/unreadable, or trying to read a file that isn't there (according to py)... or so it seems.

MarcBrau commented 5 years ago

WSL isn't exactly the same as running Linux, still uses windows kernel (WSL 2 is another story). May or may not be sufficient change to be impactful (it has, on some occasions, being impactful on some of my projects when using WSL).

Yeah I know but still, I don't think that this is os related.

Yeah, didn't change models.py. Only files I changed were the ones I pasted here. I wonder if your problem has anything to do with your weights file itself. You were able to convert the default one from darknet, right? Yep, the moment I turn the classes to 80 everything works perfectly fine. Can you debug the convert and check what you get on the "wf" when it reaches line 74 of utils.py, and what you get from 27 and 52 (all instances wf is read)? I just took a quick look at it, and that assertion is to check if the weights file is empty or when read returns empty, so it either the weight file is corrupt/unreadable, or trying to read a file that isn't there (according to py)... or so it seems.

Well, downloaded the weights file quite a few times as this was also one of my thoughts. It seems as if the weights file is correctly read as len(wf.read) = 248007028 in line 27. Then in line it is 22536180 so it seems as if the weight file gets read, but then on line 74 it is still ... wait a second... what?! Ok, that is pretty weird! If I put a breakpoint at line 74 and check the assertion, the first time I check it returns false, however when I then only check len(wf.read) it is actually 0. When I then check the assertion again it returns True. What the hell is happening there?! I am confused...anyway looks as if it is some weird issue that triggers the assertion even if the weights-file was properly read. @hugovaz: Thank you for your help and patience!

chrisrapson commented 5 years ago

@hugovaz, can you please confirm that you only changed 4 files: convert.py, train.py, detect.py, export_tfserving.py? Would you be able to show us the output of git diff?

I made a fresh clone of this repository and then copy-pasted your version of those files from above. Actually I had been following the same approach in my earlier post where I showed the results for passing different values for the classes argument. I still get the same error as before when I run convert.py --weights my_weights_Tiny.weights --output ./checkpoints/my_weights_Tiny.tf --classes_num 1 --tiny

Perhaps there is a difference between the AssertionError that you and @ytolochko are seeing, and the ValueError that @raulberari and I are seeing?

hugovaz commented 5 years ago

@hugovaz, can you please confirm that you only changed 4 files: convert.py, train.py, detect.py, export_tfserving.py? Would you be able to show us the output of git diff?

I made a fresh clone of this repository and then copy-pasted your version of those files from above. Actually I had been following the same approach in my earlier post where I showed the results for passing different values for the classes argument. I still get the same error as before when I run convert.py --weights my_weights_Tiny.weights --output ./checkpoints/my_weights_Tiny.tf --classes_num 1 --tiny

Perhaps there is a difference between the AssertionError that you and @ytolochko are seeing, and the ValueError that @raulberari and I are seeing?

@chrisrapson

diff --git a/convert.py b/convert.py
index 772fca9..2e80c17 100644
--- a/convert.py
+++ b/convert.py
@@ -7,13 +7,14 @@ from yolov3_tf2.utils import load_darknet_weights
 flags.DEFINE_string('weights', './data/yolov3.weights', 'path to weights file')
 flags.DEFINE_string('output', './checkpoints/yolov3.tf', 'path to output')
 flags.DEFINE_boolean('tiny', False, 'yolov3 or yolov3-tiny')
+flags.DEFINE_integer('classes_num', 80, 'Number of classes in the model')

 def main(_argv):
     if FLAGS.tiny:
-        yolo = YoloV3Tiny()
+        yolo = YoloV3Tiny(classes=FLAGS.classes)
     else:
-        yolo = YoloV3()
+        yolo = YoloV3(classes=FLAGS.classes)
     yolo.summary()
     logging.info('model created')

diff --git a/detect.py b/detect.py
index 5acd7d5..cb9862e 100644
--- a/detect.py
+++ b/detect.py
@@ -17,13 +17,13 @@ flags.DEFINE_boolean('tiny', False, 'yolov3 or yolov3-tiny')
 flags.DEFINE_integer('size', 416, 'resize images to')
 flags.DEFINE_string('image', './data/girl.png', 'path to input image')
 flags.DEFINE_string('output', './output.jpg', 'path to output image')
-
+flags.DEFINE_integer('classes_num', 80, 'Number of classes in the model')

 def main(_argv):
     if FLAGS.tiny:
-        yolo = YoloV3Tiny()
+        yolo = YoloV3Tiny(classes=FLAGS.classes_num)
     else:
-        yolo = YoloV3()
+        yolo = YoloV3(classes=FLAGS.classes_num)

     yolo.load_weights(FLAGS.weights)
     logging.info('weights loaded')
diff --git a/export_tfserving.py b/export_tfserving.py
index c70e5a2..78c97ac 100644
--- a/export_tfserving.py
+++ b/export_tfserving.py
@@ -19,7 +19,7 @@ flags.DEFINE_boolean('tiny', False, 'yolov3 or yolov3-tiny')
 flags.DEFINE_string('output', './serving/yolov3/1', 'path to saved_model')
 flags.DEFINE_string('classes', './data/coco.names', 'path to classes file')
 flags.DEFINE_string('image', './data/girl.png', 'path to input image')
-
+flags.DEFINE_integer('classes_num', 80, 'Number of classes in the model')

 # TODO: remove this after upstream fix
 # modified from: tensorflow.python.keras.saving.saving_utils.trace_model_call
@@ -46,9 +46,9 @@ def trace_model_call(model):

 def main(_argv):
     if FLAGS.tiny:
-        yolo = YoloV3Tiny()
+        yolo = YoloV3Tiny(classes=FLAGS.classes_num)
     else:
-        yolo = YoloV3()
+        yolo = YoloV3(classes=FLAGS.classes_num)

     yolo.load_weights(FLAGS.weights)
     logging.info('weights loaded')
diff --git a/train.py b/train.py
index 3c722fd..19a9e9d 100644
--- a/train.py
+++ b/train.py
@@ -38,15 +38,16 @@ flags.DEFINE_integer('size', 416, 'image size')
 flags.DEFINE_integer('epochs', 2, 'number of epochs')
 flags.DEFINE_integer('batch_size', 8, 'batch size')
 flags.DEFINE_float('learning_rate', 1e-3, 'learning rate')
+flags.DEFINE_integer('classes_num', 80, 'Number of classes to train')

 def main(_argv):
     if FLAGS.tiny:
-        model = YoloV3Tiny(FLAGS.size, training=True)
+        model = YoloV3Tiny(FLAGS.size, training=True, classes=FLAGS.classes_num)
         anchors = yolo_tiny_anchors
         anchor_masks = yolo_tiny_anchor_masks
     else:
-        model = YoloV3(FLAGS.size, training=True)
+        model = YoloV3(FLAGS.size, training=True, classes=FLAGS.classes_num)
         anchors = yolo_anchors
         anchor_masks = yolo_anchor_masks

@@ -83,9 +84,9 @@ def main(_argv):
         else:
             # reset top layers
             if FLAGS.tiny:  # get initial weights
-                init_model = YoloV3Tiny(FLAGS.size, training=True)
+                init_model = YoloV3Tiny(FLAGS.size, training=True, classes=FLAGS.classes_num)
             else:
-                init_model = YoloV3(FLAGS.size, training=True)
+                init_model = YoloV3(FLAGS.size, training=True, classes=FLAGS.classes_num)

             if FLAGS.transfer == 'darknet':
                 for l in model.layers:
@@ -103,7 +104,7 @@ def main(_argv):
                         freeze_all(l)

     optimizer = tf.keras.optimizers.Adam(lr=FLAGS.learning_rate)
-    loss = [YoloLoss(anchors[mask]) for mask in anchor_masks]
+    loss = [YoloLoss(anchors[mask], classes=FLAGS.classes_num) for mask in anchor_masks]

     if FLAGS.mode == 'eager_tf':
         # Eager mode is great for debugging

There's train.py as well, but unrelated and haven't tested yet. And I had the same error as you and the OP, ValueError, not AssertionError.

chrisrapson commented 5 years ago

That's strange, I think that's exactly the same code I'm running, but it's working for you and not for me.

Two new bits of information I discovered today:

  1. when I run convert.py on the default yolov3 weights and change classes_num, I get AssertionError: failed to read all data when I set classes_num < 80 and ValueError: cannot reshape array when I set classes_num > 80. I think that makes sense. If the weights file has more data than needed, then len(wf.read()) will be non-zero. If the weights file has less data than needed, then the resulting tensor won't have the right number of elements to reshape.

  2. in the command I wrote before, the final argument is the maximum number of layers to include in the output file. There are actually 23 layers in a Tiny Yolov3 network, so 15 cuts off the last 8 layers. If you want the full network, change that to 23. (Or, it turns out any number larger than 23 is fine, so I've put 230 in the example below.) For the purposes of debugging this error, it also doesn't matter what file you provide for the input weights. The resulting weights file is still valid even if the input weights file is empty.

    $ touch empty_file
    $ ./darknet partial cfg/yolov3-tiny-custom.cfg empty_file yolov3-tiny-custom.weights 230
zzh8829 commented 5 years ago

I added hugovaz's change to master with flag "num_classes" I only tested custom training and then loading seems to work for me. Since I don't have darknet setup right now, so I don't know if loading darknet trained weights work or not.

hopefully I can get darknet ready in a few days to test that.

mmortazavi commented 5 years ago

I was facing the same issue like everyone here with num_classesanything except 80 I would get Assertion Error. Very frustrating indeed.

After searching I came across this post Openvinotoolkit , where they propose a way to converter COCO yolo3.wieghts to TF checkpoint and compatible for custom dataset. Their solution is based on the convert_weights.py script from tensorflow-yolo-v3. The instructions at Openvinotoolkit worked, and I am able to convert the yolo3.wieghts to yolov3.tf based on my custom class.names which the num_class is extracted at line 37 len(classes) in the convert_weights.py script. I have successfully created the following checkpoints:

checkpoint yolov3.tf.data-00000-of-00001 yolov3.tf.index yolov3.tf.meta

I place these files back in this repo (./data/yolov3.tf) and run the train.py as:

$ python train.py --batch_size 8 --dataset ./data/train.tfrecord --val_dataset ./data/test.tfrecord --classes ./data/class.names --weights ./data/yolov3.tf --num_classes 1 --epochs 10 --mode fit --transfer darknet

Still it does not work tough. Funny part is that there is no error, in fact training initiates:

Train for 47 steps, validate for 1 steps
Epoch 1/10
 1/47 [..............................] - ETA: 18:02 - loss: 7133.5176 - yolo_output_0_loss: 446.9039 - yolo_output_1_loss: 1257.3273 - yolo_output_2_loss: 5417.2056
Epoch 00001: saving model to checkpoints/yolov3_train_1.tf
 1/47 [..............................] - ETA: 21:04 - loss: 7133.5176 - yolo_output_0_loss: 446.9039 - yolo_output_1_loss: 1257.3273 - yolo_output_2_loss: 5417.2056

And it ends quickly with numbers, arrays from the network layers like (just printing last few steps):

.
.
.
 ['conv2d_66/bias']
    <tf.Variable 'conv2d_73/kernel:0' shape=(3, 3, 128, 256) dtype=float32, numpy=
array([[[[ 2.56833769e-02,  3.53106596e-02,  2.92579457e-03, ...,
          -1.10644400e-02,  2.80909650e-02, -1.12031400e-02],
         [ 3.25710066e-02,  2.53072120e-02,  7.41675496e-03, ...,
          -4.74590063e-03,  2.23956630e-03, -2.91122012e-02],
         [-3.94449756e-03,  4.04525921e-03, -2.45400667e-02, ...,
          -1.53229050e-02,  2.59871408e-03, -1.50345862e-02],
         ...,
         [-9.20701027e-03,  3.75410058e-02, -2.53906157e-02, ...,
          -2.10710280e-02, -1.50451269e-02,  2.27461196e-02],
         [-2.15810146e-02, -6.78275898e-03, -2.71737091e-02, ...,
          -3.07301190e-02, -2.78491378e-02, -4.02636826e-02],
         [ 1.68761723e-02,  3.75657119e-02, -6.85926154e-03, ...,
           8.84603709e-04, -6.60085678e-03, -2.27394309e-02]],

        [[ 3.16191800e-02,  1.56331174e-02,  7.44090602e-03, ...,
           2.76162215e-02,  3.97675373e-02, -1.78670287e-02],
         [ 6.47032261e-03, -1.99065730e-03, -9.33222473e-05, ...,
          -2.47831158e-02, -9.80031490e-03, -1.08290818e-02],
         [ 7.34843686e-03,  8.85882974e-03,  2.25224607e-02, ...,
          -3.25706005e-02,  1.73903629e-03, -8.60580802e-03],
         ...,
         [-2.17638221e-02, -3.93685400e-02, -5.48325852e-03, ...,
           3.13389301e-03, -1.90678537e-02, -2.20004022e-02],
         [ 3.34427245e-02, -8.77577811e-04, -1.16094854e-02, ...,
          -2.50200443e-02,  1.34349652e-02,  4.19984385e-03],
         [-2.95360014e-03, -1.21537969e-03,  3.14849950e-02, ...,
           3.96463163e-02, -3.20908353e-02,  3.59526463e-02]],

        [[-8.17315653e-03,  3.55972461e-02,  8.94585252e-03, ...,
           2.87264697e-02,  9.28323716e-04,  2.93496251e-03],
         [-2.32036319e-02, -3.66658866e-02, -3.82214785e-03, ...,
          -4.31816652e-03,  1.18423887e-02,  2.89178379e-02],
         [ 3.49936895e-02, -2.47862637e-02,  2.17809863e-02, ...,
           2.38164254e-02, -2.69281864e-02,  4.10674326e-02],
         ...,
         [-3.71617489e-02,  1.09954961e-02,  2.09607817e-02, ...,
          -2.93673947e-03,  1.28097534e-02, -1.68664157e-02],
         [-3.52712385e-02,  3.05474289e-02, -2.90135555e-02, ...,
          -1.61321871e-02, -3.29488516e-02, -2.74983458e-02],
         [ 1.65389292e-02,  2.72813477e-02, -1.80646293e-02, ...,
          -8.67815688e-03,  3.73781845e-03, -1.44661572e-02]]],

       [[[ 1.96931548e-02, -3.36839631e-03,  1.27739422e-02, ...,
          -3.30148861e-02,  2.44725384e-02, -1.60073154e-02],
         [ 2.07907744e-02,  2.22274996e-02, -2.24555340e-02, ...,
          -1.23657584e-02,  3.33482660e-02,  3.88374962e-02],
         [ 8.75295326e-03,  1.37487054e-03,  2.99120285e-02, ...,
           3.44751365e-02,  1.22276433e-02,  2.58241184e-02],
         ...,
         [ 2.17899568e-02,  1.37341507e-02,  1.29186511e-02, ...,
          -8.20476562e-04, -2.01763213e-02,  1.39517188e-02],
         [ 3.94333154e-04, -3.09851468e-02, -1.06522255e-02, ...,
          -1.68433189e-02,  3.63192298e-02,  1.97090507e-02],
         [ 2.48928852e-02, -2.13638246e-02,  2.72096135e-02, ...,
          -2.41022818e-02, -6.73611835e-03,  3.43840159e-02]],

        [[ 8.62279534e-03,  3.06403451e-02,  2.89860554e-02, ...,
          -3.22639570e-02,  1.74001977e-03, -2.05041282e-02],
         [ 3.89000811e-02,  2.91399769e-02, -6.34120777e-03, ...,
           2.62246318e-02,  3.44042853e-03,  1.01970136e-02],
         [ 3.83997448e-02, -1.24698486e-02,  2.84894817e-02, ...,
           2.08893605e-02,  4.21829149e-03, -3.77052240e-02],
         ...,
         [ 1.82667673e-02, -1.09125376e-02, -4.08678949e-02, ...,
          -2.39115562e-02, -1.83074977e-02,  1.75374858e-02],
         [-3.46915238e-02,  2.51976885e-02, -3.55364978e-02, ...,
           3.08285095e-02,  3.77385728e-02,  1.55052431e-02],
         [ 2.84906812e-02, -3.99287567e-02,  3.47629078e-02, ...,
           2.09869482e-02, -1.79757476e-02,  6.48398325e-03]],

        [[-3.93467844e-02,  9.47195292e-03,  1.50587447e-02, ...,
           9.26125050e-03, -2.07577646e-02,  3.32169123e-02],
         [-9.46620107e-03, -3.61755416e-02,  1.22757256e-02, ...,
           3.33716683e-02,  3.47571559e-02,  3.73974629e-02],
         [-1.86664984e-03, -3.09974160e-02,  4.08360548e-02, ...,
          -3.13572511e-02,  2.16294564e-02,  1.23923123e-02],
         ...,
         [-2.06459463e-02, -1.65886693e-02, -3.89652960e-02, ...,
           2.30985768e-02, -1.24552660e-02, -4.02135774e-02],
         [-4.15030718e-02,  4.01438661e-02,  1.01896115e-02, ...,
           2.55445763e-03,  2.37345137e-02, -8.41187313e-03],
         [ 3.19388546e-02, -3.43604498e-02,  2.74556093e-02, ...,
           3.36689465e-02, -2.26087477e-02, -2.15530396e-04]]],

       [[[-2.49405205e-02,  1.49820037e-02,  2.27186866e-02, ...,
          -1.56036727e-02, -1.60476975e-02,  2.14134939e-02],
         [-1.82793848e-02,  1.43152475e-03, -2.64450517e-02, ...,
           1.84600651e-02, -6.57703355e-03,  2.22041942e-02],
         [ 3.33471037e-02,  1.78856961e-02,  1.23374350e-02, ...,
           4.70654294e-03,  1.14994235e-02,  1.95171647e-02],
         ...,
         [ 1.03595853e-02, -2.72310581e-02,  9.39974189e-03, ...,
          -3.52265537e-02,  3.00867967e-02, -1.20449867e-02],
         [-2.05356777e-02, -3.82648781e-03, -6.40147924e-03, ...,
          -1.43413655e-02,  4.64547426e-04, -3.91161442e-03],
         [ 8.97754356e-03,  4.39928100e-03, -2.34277751e-02, ...,
           3.13213579e-02, -1.11669414e-02,  1.83453262e-02]],

        [[-3.52268033e-02, -2.54315138e-03,  2.10853107e-02, ...,
           2.11926512e-02, -2.36858428e-02,  7.87854195e-03],
         [ 1.48043148e-02,  5.25813550e-04, -3.71724889e-02, ...,
          -3.74444425e-02,  1.94166489e-02,  1.05408728e-02],
         [-2.35673301e-02, -1.08658075e-02,  1.24150030e-02, ...,
           2.24410184e-02, -1.46143939e-02, -1.39984488e-03],
         ...,
         [-7.21018761e-04, -2.88872719e-02, -1.30725205e-02, ...,
           2.76407711e-02,  3.83999459e-02, -1.09355953e-02],
         [ 2.89773531e-02, -2.43233051e-02,  2.55542509e-02, ...,
           3.19750942e-02,  2.25668438e-02,  2.88993753e-02],
         [-1.19396951e-02, -3.33687365e-02,  5.48367575e-03, ...,
           2.74873190e-02, -1.88654568e-02,  2.32724883e-02]],

        [[-1.22158229e-02, -9.37895104e-03,  2.62992866e-02, ...,
           1.75433457e-02, -1.19793415e-02,  3.29534225e-02],
         [ 1.36801489e-02, -3.61974947e-02,  2.06384249e-02, ...,
          -3.51618230e-02,  1.07067339e-02,  3.87417413e-02],
         [ 2.91848071e-02,  2.98367068e-03,  1.16222613e-02, ...,
           3.13360244e-04,  1.48602836e-02, -1.08053200e-02],
         ...,
         [ 3.68827097e-02,  4.04213704e-02, -1.77661590e-02, ...,
          -2.45228615e-02, -5.05558774e-03, -3.47705707e-02],
         [-3.09361815e-02,  2.66242735e-02,  2.62162723e-02, ...,
           3.55829783e-02,  3.12818028e-02,  2.91649811e-02],
         [ 3.14220302e-02,  1.65082105e-02, -1.50009394e-02, ...,
           2.57538371e-02,  1.35435164e-02, -2.30898075e-02]]]],
      dtype=float32)>: ['conv2d_73/kernel']
    <tf.Variable 'batch_normalization_71/gamma:0' shape=(256,) dtype=float32, numpy=
array([1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
       1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
       1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
       1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
       1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
       1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
       1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
       1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
       1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
       1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
       1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
       1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
       1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
       1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
       1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
       1.], dtype=float32)>: ['batch_normalization_71/gamma']
    <tf.Variable 'batch_normalization_71/beta:0' shape=(256,) dtype=float32, numpy=
array([0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0.], dtype=float32)>: ['batch_normalization_71/beta']
    <tf.Variable 'batch_normalization_71/moving_mean:0' shape=(256,) dtype=float32, numpy=
array([0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0.], dtype=float32)>: ['batch_normalization_71/moving_mean']
    <tf.Variable 'batch_normalization_71/moving_variance:0' shape=(256,) dtype=float32, numpy=
array([1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
       1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
       1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
       1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
       1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
       1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
       1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
       1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
       1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
       1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
       1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
       1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
       1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
       1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
       1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
       1.], dtype=float32)>: ['batch_normalization_71/moving_variance']
    <tf.Variable 'conv2d_74/kernel:0' shape=(1, 1, 256, 18) dtype=float32, numpy=
array([[[[-0.0242654 , -0.13148566,  0.09931047, ...,  0.00801411,
          -0.09128695, -0.12943399],
         [ 0.04433893, -0.03389428,  0.02649385, ...,  0.10999328,
          -0.04562915,  0.01654927],
         [ 0.14623573,  0.08937761,  0.04920276, ...,  0.14376149,
           0.01536031, -0.1459782 ],
         ...,
         [ 0.12291595,  0.0041706 ,  0.10054782, ..., -0.03272729,
           0.03011586,  0.10305405],
         [-0.07295077,  0.12444386,  0.01255541, ..., -0.10320055,
          -0.04048014, -0.00323914],
         [-0.10248192, -0.08746366, -0.01309262, ..., -0.04655813,
           0.03565833, -0.09768917]]]], dtype=float32)>: ['conv2d_74/kernel']
    <tf.Variable 'conv2d_74/bias:0' shape=(18,) dtype=float32, numpy=
array([0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0.], dtype=float32)>: ['conv2d_74/bias']
.
.
.

I am quite puzzled right now! I do not think it is os-relevant, but for the sake of completeness I am working on Windows 10.

zzh8829 commented 4 years ago

Hi all i just finished the documentation on training with custom number of classes. https://github.com/zzh8829/yolov3-tf2/blob/master/docs/training_voc.md Please open new issue if you still have problem following the new instruction

robisen1 commented 4 years ago

Yes, it's indeed related to the default number of classes. I'm currently experiencing the sam

I am pretty sure you cannot. Your options are use transfer learning and if you do that you need to use the same labels. train a model from scratch and then you can use any labels you want

aimanyounises1 commented 3 years ago

How exactly this problem is closed? I get an assertion failed error while I convert model to tensorflow