Trained weights cannot be used on another PC / machine

chrisTopp84 commented 3 years ago

Hi,

I trained new YOLOv3-tiny weights on my own PC and everything worked.

Now I copied these files

yolov3_custom_Tiny.data-00000-of-00001
yolov3_custom_Tiny.index

and pasted them on a another PC.

After running

python detection_custom.py

the following error occurs. Is there any way to fix this problem?

Any help ist welcome :)

Best regards chris

2021-02-23 09:55:25.178585: W tensorflow/stream_executor/platform/default/dso_loader.cc:59] Could not load dynamic library 'cudart64_101.dll'; dlerror: cudart64_101.dll not found 2021-02-23 09:55:25.178793: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. 2021-02-23 09:55:27.650603: W tensorflow/stream_executor/platform/default/dso_loader.cc:59] Could not load dynamic library 'nvcuda.dll'; dlerror: nvcuda.dll not found 2021-02-23 09:55:27.650783: W tensorflow/stream_executor/cuda/cuda_driver.cc:312] failed call to cuInit: UNKNOWN ERROR (303) 2021-02-23 09:55:27.660051: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:169] retrieving CUDA diagnostic information for host: CCZConvertable 2021-02-23 09:55:27.660373: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:176] hostname: CCZConvertable Loading custom weights from: ./checkpoints/yolov3_custom_Tiny 2021-02-23 09:55:27.683717: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN)to use the following CPU instructions in performance-critical operations: AVX2 To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. 2021-02-23 09:55:27.803252: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x20272f627e0 initialized for platform Host (this does not guarantee that XLA will be used). Devices: 2021-02-23 09:55:27.803381: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): Host, Default Version Traceback (most recent call last): File "detection_custom.py", line 22, in yolo = Load_Yolo_model() File "C:\Users\czerny\Documents\06_Python\TensorFlow-2.x-YOLOv3\yolov3\utils.py", line 99, in Load_Yolo_model yolo.load_weights(checkpoint) # use custom weights File "C:\Users\czerny\anaconda3\lib\site-packages\tensorflow\python\keras\engine\training.py", line 2182, in load_weights status = self._trackable_saver.restore(filepath, options) File "C:\Users\czerny\anaconda3\lib\site-packages\tensorflow\python\training\tracking\util.py", line 1320, in restore checkpoint=checkpoint, proto_id=0).restore(self._graph_view.root) File "C:\Users\czerny\anaconda3\lib\site-packages\tensorflow\python\training\tracking\base.py", line 209, in restore restore_ops = trackable._restore_from_checkpoint_position(self) # pylint: disable=protected-access File "C:\Users\czerny\anaconda3\lib\site-packages\tensorflow\python\training\tracking\base.py", line 914, in _restore_from_checkpoint_position tensor_saveables, python_saveables)) File "C:\Users\czerny\anaconda3\lib\site-packages\tensorflow\python\training\tracking\util.py", line 297, in restore_saveables validated_saveables).restore(self.save_path_tensor, self.options) File "C:\Users\czerny\anaconda3\lib\site-packages\tensorflow\python\training\saving\functional_saver.py", line 340, in restore restore_ops = restore_fn() File "C:\Users\czerny\anaconda3\lib\site-packages\tensorflow\python\training\saving\functional_saver.py", line 316, in restore_fn restore_ops.update(saver.restore(file_prefix, options)) File "C:\Users\czerny\anaconda3\lib\site-packages\tensorflow\python\training\saving\functional_saver.py", line 111, in restore restored_tensors, restored_shapes=None) File "C:\Users\czerny\anaconda3\lib\site-packages\tensorflow\python\training\saving\saveable_object_util.py", line 127, in restore self.handle_op, self._var_shape, restored_tensor) File "C:\Users\czerny\anaconda3\lib\site-packages\tensorflow\python\ops\resource_variable_ops.py", line 311, in shape_safe_assign_variable_handle shape.assert_is_compatible_with(value_tensor.shape) File "C:\Users\czerny\anaconda3\lib\site-packages\tensorflow\python\framework\tensor_shape.py", line 1134, in assert_is_compatible_with raise ValueError("Shapes %s and %s are incompatible" % (self, other)) ValueError: Shapes (45,) and (18,) are incompatible

pangshengwei commented 3 years ago

when you say everything worked - do you mean you are able to load the weights on the PC that you trained and do some testing?

My trained weights are saved to yolov4-trt-INT8-416-vehicles-plates.data-00000-of-00001

However, when loading the weights using Load_Yolo_model() im getting this error, which is complaining that the SavedModel file does not exist...

When I tried to use yolo.load_weights(path), I got this error instead...

So im puzzled why a new weights directory that contains the SavedModel weights is not generated during training but rather a .data file

chrisTopp84 commented 3 years ago

Yes i could do testings and i could detect persons.

I have not dealt with this problem any further now, so unfortunately i can't give you any good advice.

BR chris

alanwein commented 3 years ago

I met the same problem

ValueError("Shapes %s and %s are incompatible" % (self, other)) ValueError: Shapes (45,) and (18,) are incompatible

a sing class object detector is trained after run python train.py. and i get four files in ./checkpoint/ in configs.py I have revised as

#================================================================
#
#   File name   : configs.py
#   Author      : PyLessons
#   Created date: 2020-08-18
#   Website     : https://pylessons.com/
#   GitHub      : https://github.com/pythonlessons/TensorFlow-2.x-YOLOv3
#   Description : yolov3 configuration file
#
#================================================================

# YOLO options
YOLO_TYPE                   = "yolov3" # yolov4 or yolov3
YOLO_FRAMEWORK              = "tf" # "tf" or "trt"
YOLO_V3_WEIGHTS             = "model_data/yolov3.weights"
YOLO_V4_WEIGHTS             = "model_data/yolov4.weights"
YOLO_V3_TINY_WEIGHTS        = "model_data/yolov3-tiny.weights"
YOLO_V4_TINY_WEIGHTS        = "model_data/yolov4-tiny.weights"
YOLO_TRT_QUANTIZE_MODE      = "INT8" # INT8, FP16, FP32
YOLO_CUSTOM_WEIGHTS         = True # "checkpoints/yolov3_custom" # used in evaluate_mAP.py and custom model detection, if not using leave False
                            # YOLO_CUSTOM_WEIGHTS also used with TensorRT and custom model detection
# YOLO_COCO_CLASSES           = "model_data/coco/coco.names"
YOLO_COCO_CLASSES           = "model_data/Dataset.names"
YOLO_STRIDES                = [8, 16, 32]
YOLO_IOU_LOSS_THRESH        = 0.5
YOLO_ANCHOR_PER_SCALE       = 3
YOLO_MAX_BBOX_PER_SCALE     = 100
YOLO_INPUT_SIZE             = 320 #416
if YOLO_TYPE                == "yolov4":
    YOLO_ANCHORS            = [[[12,  16], [19,   36], [40,   28]],
                               [[36,  75], [76,   55], [72,  146]],
                               [[142,110], [192, 243], [459, 401]]]
if YOLO_TYPE                == "yolov3":
    YOLO_ANCHORS            = [[[10,  13], [16,   30], [33,   23]],
                               [[30,  61], [62,   45], [59,  119]],
                               [[116, 90], [156, 198], [373, 326]]]
# Train options
TRAIN_YOLO_TINY             = True#False
TRAIN_SAVE_BEST_ONLY        = True # saves only best model according validation loss (True recommended)
TRAIN_SAVE_CHECKPOINT       = False # saves all best validated checkpoints in training process (may require a lot disk space) (False recommended)
TRAIN_CLASSES               = "mnist/mnist.names"
TRAIN_ANNOT_PATH            = "mnist/mnist_train.txt"
TRAIN_LOGDIR                = "log"
TRAIN_CHECKPOINTS_FOLDER    = "checkpoints"
TRAIN_MODEL_NAME            = f"{YOLO_TYPE}_custom"
TRAIN_LOAD_IMAGES_TO_RAM    = True # With True faster training, but need more RAM
TRAIN_BATCH_SIZE            = 4
TRAIN_INPUT_SIZE            = 320#416
TRAIN_DATA_AUG              = True
TRAIN_TRANSFER              = True
TRAIN_FROM_CHECKPOINT       = False # "checkpoints/yolov3_custom"
TRAIN_LR_INIT               = 1e-4
TRAIN_LR_END                = 1e-6
TRAIN_WARMUP_EPOCHS         = 2
TRAIN_EPOCHS                = 100

# TEST options
TEST_ANNOT_PATH             = "mnist/mnist_test.txt"
TEST_BATCH_SIZE             = 4
TEST_INPUT_SIZE             = 416
TEST_DATA_AUG               = False
TEST_DECTECTED_IMAGE_PATH   = ""
TEST_SCORE_THRESHOLD        = 0.3
TEST_IOU_THRESHOLD          = 0.45

#YOLOv3-TINY and YOLOv4-TINY WORKAROUND
if TRAIN_YOLO_TINY:
    YOLO_STRIDES            = [16, 32, 64]    
    YOLO_ANCHORS            = [[[10,  14], [23,   27], [37,   58]],
                               [[81,  82], [135, 169], [344, 319]],
                               [[0,    0], [0,     0], [0,     0]]]

the detection_demo.py also be changed

#================================================================
#
#   File name   : detection_demo.py
#   Author      : PyLessons
#   Created date: 2020-09-27
#   Website     : https://pylessons.com/
#   GitHub      : https://github.com/pythonlessons/TensorFlow-2.x-YOLOv3
#   Description : object detection image and video example
#
#================================================================
import os
os.environ['CUDA_VISIBLE_DEVICES'] = '0'
import cv2
import numpy as np
import tensorflow as tf
from yolov3.utils import detect_image, detect_realtime, detect_video, Load_Yolo_model, detect_video_realtime_mp
from yolov3.configs import *

image_path   = "./IMAGES/kite.jpg"
video_path   = "./IMAGES/test.mp4"

yolo = Load_Yolo_model()
#detect_image(yolo, image_path, "./IMAGES/kite_pred.jpg", input_size=YOLO_INPUT_SIZE, show=True, rectangle_colors=(255,0,0))
#detect_video(yolo, video_path, "", input_size=YOLO_INPUT_SIZE, show=False, rectangle_colors=(255,0,0))
detect_realtime(yolo, '', input_size=YOLO_INPUT_SIZE, show=True, rectangle_colors=(255, 0, 0))

#detect_video_realtime_mp(video_path, "Output.mp4", input_size=YOLO_INPUT_SIZE, show=False, rectangle_colors=(255,0,0), realtime=False)

when i run python detection_demo.py, the error occured

/usr/lib/python3/dist-packages/requests/__init__.py:91: RequestsDependencyWarning: urllib3 (1.26.5) or chardet (3.0.4) doesn't match a supported version!
  RequestsDependencyWarning) 
Loading custom weights from: ./checkpoints/yolov3_custom_Tiny 
Traceback (most recent call last):
   File "detection_demo.py", line 22, in <module>
     yolo = Load_Yolo_model()
   File "/home/pi/Desktop/TensorFlow-2.x-YOLOv3/yolov3/utils.py", line 99, in Load_Yolo_model
     yolo.load_weights(checkpoint)  # use custom weights
   File "/usr/local/lib/python3.7/dist-packages/tensorflow/python/keras/engine/training.py", line 250, in load_weights
     return super(Model, self).load_weights(filepath, by_name, skip_mismatch)
   File "/usr/local/lib/python3.7/dist-packages/tensorflow/python/keras/engine/network.py", line 1237, in load_weights
     status = self._trackable_saver.restore(filepath)
   File "/usr/local/lib/python3.7/dist-packages/tensorflow/python/training/tracking/util.py", line 1304, in restore
     checkpoint=checkpoint, proto_id=0).restore(self._graph_view.root)
   File "/usr/local/lib/python3.7/dist-packages/tensorflow/python/training/tracking/base.py", line 209, in restore
     restore_ops = trackable._restore_from_checkpoint_position(self)  # pylint: disable=protected-access
   File "/usr/local/lib/python3.7/dist-packages/tensorflow/python/training/tracking/base.py", line 907, in _restore_from_checkpoint_position
     tensor_saveables, python_saveables))
   File "/usr/local/lib/python3.7/dist-packages/tensorflow/python/training/tracking/util.py", line 289, in restore_saveables
     validated_saveables).restore(self.save_path_tensor)
   File "/usr/local/lib/python3.7/dist-packages/tensorflow/python/training/saving/functional_saver.py", line 281, in restore
     restore_ops.update(saver.restore(file_prefix))
   File "/usr/local/lib/python3.7/dist-packages/tensorflow/python/training/saving/functional_saver.py", line 103, in restore
     restored_tensors, restored_shapes=None)
   File "/usr/local/lib/python3.7/dist-packages/tensorflow/python/training/saving/saveable_object_util.py", line 116, in restore
     self.handle_op, self._var_shape, restored_tensor)
   File "/usr/local/lib/python3.7/dist-packages/tensorflow/python/ops/resource_variable_ops.py", line 308, in shape_safe_assign_variable_handle 
    shape.assert_is_compatible_with(value_tensor.shape)
   File "/usr/local/lib/python3.7/dist-packages/tensorflow/python/framework/tensor_shape.py", line 1117, in assert_is_compatible_with
     raise ValueError("Shapes %s and %s are incompatible" % (self, other)) ValueError: Shapes (45,) and (18,) are incompatible

navidasj96 commented 2 years ago

i just faced this problem and the solution is :

copy both of generated files(yolov3_custom.data-00000-of-00001 and yolov3_custom.index) in the checkpoint file(or if its there already its ok)
we have to set the config.py file like this

firt of all name the YOLO_V3_WEIGHTS like "/content/TensorFlow-2.x-YOLOv3/checkpoints/yolov3_custom" and then the most important part is to set YOLO_CUSTOM_WEIGHTS = True then run the detection_custom.py and it should work

pythonlessons / TensorFlow-2.x-YOLOv3

Trained weights cannot be used on another PC / machine #137