szbela87 / welding

GNU General Public License v3.0
2 stars 0 forks source link

Yolov5 training issue #1

Open openedev opened 7 months ago

openedev commented 7 months ago

Hi,

I'm trying to follow your instruction to train the model for yolov5 but found below please check the same.

$ ls
images  welding  welding_images.zip  yolov5
$ rm -rf yolov5/data/images/
$ cp images/ yolov5/data/ -rf
$ mkdir yolov5/data/labels
$ cp yolov5/data/images/*.txt yolov5/data/labels/
$ cp welding/yolov5_files/autosplit_* yolov5/data/
$ cp welding/yolov5_files/welding_data.yaml yolov5/
$ cp welding/yolov5_files/hyp.scratch-* yolov5/data/hyps/

Here, is I'm training.

$ cd yolov5
$ python train.py --cos-lr --img 640 --batch 32 --epochs 200 --data welding_data.yaml --weights yolov5n.pt --project defects --name model_5n_dec4 --cache --freeze 10 
2024-02-13 13:46:19.973283: E tensorflow/compiler/xla/stream_executor/cuda/cuda_dnn.cc:9342] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2024-02-13 13:46:19.973332: E tensorflow/compiler/xla/stream_executor/cuda/cuda_fft.cc:609] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2024-02-13 13:46:19.973363: E tensorflow/compiler/xla/stream_executor/cuda/cuda_blas.cc:1518] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
train: weights=yolov5n.pt, cfg=, data=welding_data.yaml, hyp=data/hyps/hyp.scratch-low.yaml, epochs=200, batch_size=32, imgsz=640, rect=False, resume=False, nosave=False, noval=False, noautoanchor=False, noplots=False, evolve=None, evolve_population=data/hyps, resume_evolve=None, bucket=, cache=ram, image_weights=False, device=, multi_scale=False, single_cls=False, optimizer=SGD, sync_bn=False, workers=8, project=defects, name=model_5n_dec4, exist_ok=False, quad=False, cos_lr=True, label_smoothing=0.0, patience=100, freeze=[10], save_period=-1, seed=0, local_rank=-1, entity=None, upload_dataset=False, bbox_interval=-1, artifact_alias=latest, ndjson_console=False, ndjson_file=False
github: up to date with https://github.com/ultralytics/yolov5 βœ…
YOLOv5 πŸš€ v7.0-284-g95ebf68f Python-3.11.5 torch-2.2.0+cu121 CPU

hyperparameters: lr0=0.01, lrf=0.1, momentum=0.9, weight_decay=0.0005, warmup_epochs=0.3, warmup_momentum=0.8, warmup_bias_lr=0.1, box=0.05, cls=0.5, cls_pw=1.0, obj=1.0, obj_pw=1.0, iou_t=0.2, anchor_t=4.0, fl_gamma=0.0, hsv_h=0.015, hsv_s=0.7, hsv_v=0.4, degrees=0.0, translate=0.1, scale=0.5, shear=0.0, perspective=0.0, flipud=0.0, fliplr=0.5, mosaic=1.0, mixup=0.0, copy_paste=0.0
Comet: run 'pip install comet_ml' to automatically track and visualize YOLOv5 πŸš€ runs in Comet
TensorBoard: Start with 'tensorboard --logdir defects', view at http://localhost:6006/
Downloading https://github.com/ultralytics/yolov5/releases/download/v7.0/yolov5n.pt to yolov5n.pt...
100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 3.87M/3.87M [00:00<00:00, 7.28MB/s]

Overriding model.yaml nc=80 with nc=7

                 from  n    params  module                                  arguments                     
  0                -1  1      1760  models.common.Conv                      [3, 16, 6, 2, 2]              
  1                -1  1      4672  models.common.Conv                      [16, 32, 3, 2]                
  2                -1  1      4800  models.common.C3                        [32, 32, 1]                   
  3                -1  1     18560  models.common.Conv                      [32, 64, 3, 2]                
  4                -1  2     29184  models.common.C3                        [64, 64, 2]                   
  5                -1  1     73984  models.common.Conv                      [64, 128, 3, 2]               
  6                -1  3    156928  models.common.C3                        [128, 128, 3]                 
  7                -1  1    295424  models.common.Conv                      [128, 256, 3, 2]              
  8                -1  1    296448  models.common.C3                        [256, 256, 1]                 
  9                -1  1    164608  models.common.SPPF                      [256, 256, 5]                 
 10                -1  1     33024  models.common.Conv                      [256, 128, 1, 1]              
 11                -1  1         0  torch.nn.modules.upsampling.Upsample    [None, 2, 'nearest']          
 12           [-1, 6]  1         0  models.common.Concat                    [1]                           
 13                -1  1     90880  models.common.C3                        [256, 128, 1, False]          
 14                -1  1      8320  models.common.Conv                      [128, 64, 1, 1]               
 15                -1  1         0  torch.nn.modules.upsampling.Upsample    [None, 2, 'nearest']          
 16           [-1, 4]  1         0  models.common.Concat                    [1]                           
 17                -1  1     22912  models.common.C3                        [128, 64, 1, False]           
 18                -1  1     36992  models.common.Conv                      [64, 64, 3, 2]                
 19          [-1, 14]  1         0  models.common.Concat                    [1]                           
 20                -1  1     74496  models.common.C3                        [128, 128, 1, False]          
 21                -1  1    147712  models.common.Conv                      [128, 128, 3, 2]              
 22          [-1, 10]  1         0  models.common.Concat                    [1]                           
 23                -1  1    296448  models.common.C3                        [256, 256, 1, False]          
 24      [17, 20, 23]  1     16236  models.yolo.Detect                      [7, [[10, 13, 16, 30, 33, 23], [30, 61, 62, 45, 59, 119], [116, 90, 156, 198, 373, 326]], [64, 128, 256]]
Model summary: 214 layers, 1773388 parameters, 1773388 gradients, 4.3 GFLOPs

Transferred 343/349 items from yolov5n.pt
freezing model.0.conv.weight
freezing model.0.bn.weight
freezing model.0.bn.bias
freezing model.1.conv.weight
freezing model.1.bn.weight
freezing model.1.bn.bias
freezing model.2.cv1.conv.weight
freezing model.2.cv1.bn.weight
freezing model.2.cv1.bn.bias
freezing model.2.cv2.conv.weight
freezing model.2.cv2.bn.weight
freezing model.2.cv2.bn.bias
freezing model.2.cv3.conv.weight
freezing model.2.cv3.bn.weight
freezing model.2.cv3.bn.bias
freezing model.2.m.0.cv1.conv.weight
freezing model.2.m.0.cv1.bn.weight
freezing model.2.m.0.cv1.bn.bias
freezing model.2.m.0.cv2.conv.weight
freezing model.2.m.0.cv2.bn.weight
freezing model.2.m.0.cv2.bn.bias
freezing model.3.conv.weight
freezing model.3.bn.weight
freezing model.3.bn.bias
freezing model.4.cv1.conv.weight
freezing model.4.cv1.bn.weight
freezing model.4.cv1.bn.bias
freezing model.4.cv2.conv.weight
freezing model.4.cv2.bn.weight
freezing model.4.cv2.bn.bias
freezing model.4.cv3.conv.weight
freezing model.4.cv3.bn.weight
freezing model.4.cv3.bn.bias
freezing model.4.m.0.cv1.conv.weight
freezing model.4.m.0.cv1.bn.weight
freezing model.4.m.0.cv1.bn.bias
freezing model.4.m.0.cv2.conv.weight
freezing model.4.m.0.cv2.bn.weight
freezing model.4.m.0.cv2.bn.bias
freezing model.4.m.1.cv1.conv.weight
freezing model.4.m.1.cv1.bn.weight
freezing model.4.m.1.cv1.bn.bias
freezing model.4.m.1.cv2.conv.weight
freezing model.4.m.1.cv2.bn.weight
freezing model.4.m.1.cv2.bn.bias
freezing model.5.conv.weight
freezing model.5.bn.weight
freezing model.5.bn.bias
freezing model.6.cv1.conv.weight
freezing model.6.cv1.bn.weight
freezing model.6.cv1.bn.bias
freezing model.6.cv2.conv.weight
freezing model.6.cv2.bn.weight
freezing model.6.cv2.bn.bias
freezing model.6.cv3.conv.weight
freezing model.6.cv3.bn.weight
freezing model.6.cv3.bn.bias
freezing model.6.m.0.cv1.conv.weight
freezing model.6.m.0.cv1.bn.weight
freezing model.6.m.0.cv1.bn.bias
freezing model.6.m.0.cv2.conv.weight
freezing model.6.m.0.cv2.bn.weight
freezing model.6.m.0.cv2.bn.bias
freezing model.6.m.1.cv1.conv.weight
freezing model.6.m.1.cv1.bn.weight
freezing model.6.m.1.cv1.bn.bias
freezing model.6.m.1.cv2.conv.weight
freezing model.6.m.1.cv2.bn.weight
freezing model.6.m.1.cv2.bn.bias
freezing model.6.m.2.cv1.conv.weight
freezing model.6.m.2.cv1.bn.weight
freezing model.6.m.2.cv1.bn.bias
freezing model.6.m.2.cv2.conv.weight
freezing model.6.m.2.cv2.bn.weight
freezing model.6.m.2.cv2.bn.bias
freezing model.7.conv.weight
freezing model.7.bn.weight
freezing model.7.bn.bias
freezing model.8.cv1.conv.weight
freezing model.8.cv1.bn.weight
freezing model.8.cv1.bn.bias
freezing model.8.cv2.conv.weight
freezing model.8.cv2.bn.weight
freezing model.8.cv2.bn.bias
freezing model.8.cv3.conv.weight
freezing model.8.cv3.bn.weight
freezing model.8.cv3.bn.bias
freezing model.8.m.0.cv1.conv.weight
freezing model.8.m.0.cv1.bn.weight
freezing model.8.m.0.cv1.bn.bias
freezing model.8.m.0.cv2.conv.weight
freezing model.8.m.0.cv2.bn.weight
freezing model.8.m.0.cv2.bn.bias
freezing model.9.cv1.conv.weight
freezing model.9.cv1.bn.weight
freezing model.9.cv1.bn.bias
freezing model.9.cv2.conv.weight
freezing model.9.cv2.bn.weight
freezing model.9.cv2.bn.bias
optimizer: SGD(lr=0.01) with parameter groups 57 weight(decay=0.0), 60 weight(decay=0.0005), 60 bias
train: Scanning /home/build/shared/airockchip/welding/yolov5/data/autosplit_train... 201 images, 0 backgrounds, 0 corrupt: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 201/201 [00:00<00:00, 5021.02it/s]
train: New cache created: /home/build/shared/airockchip/welding/yolov5/data/autosplit_train.cache
train: Caching images (0.2GB ram): 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 201/201 [00:05<00:00, 35.55it/s]
val: Scanning /home/build/shared/airockchip/welding/yolov5/data/autosplit_val... 56 images, 0 backgrounds, 0 corrupt: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 56/56 [00:00<00:00, 382.35it/s]
val: New cache created: /home/build/shared/airockchip/welding/yolov5/data/autosplit_val.cache
ERROR: Unexpected bus error encountered in worker. This might be caused by insufficient shared memory (shm).
ERROR: Unexpected bus error encountered in worker. This might be caused by insufficient shared memory (shm).
ERROR: Unexpected bus error encountered in worker. This might be caused by insufficient shared memory (shm).
ERROR: Unexpected bus error encountered in worker. This might be caused by insufficient shared memory (shm).
Traceback (most recent call last):
  File "/home/build/conda/lib/python3.11/multiprocessing/queues.py", line 244, in _feed
    obj = _ForkingPickler.dumps(obj)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/build/conda/lib/python3.11/multiprocessing/reduction.py", line 51, in dumps
    cls(buf, protocol).dump(obj)
  File "/home/build/conda/lib/python3.11/site-packages/torch/multiprocessing/reductions.py", line 568, in reduce_storage
    fd, size = storage._share_fd_cpu_()
               ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/build/conda/lib/python3.11/site-packages/torch/storage.py", line 294, in wrapper
    return fn(self, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/build/conda/lib/python3.11/site-packages/torch/storage.py", line 364, in _share_fd_cpu_
    return super()._share_fd_cpu_(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: unable to write to file </torch_1503_1315273561_0>: No space left on device (28)
Traceback (most recent call last):
  File "/home/build/conda/lib/python3.11/multiprocessing/queues.py", line 244, in _feed
    obj = _ForkingPickler.dumps(obj)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/build/conda/lib/python3.11/multiprocessing/reduction.py", line 51, in dumps
    cls(buf, protocol).dump(obj)
  File "/home/build/conda/lib/python3.11/site-packages/torch/multiprocessing/reductions.py", line 568, in reduce_storage
    fd, size = storage._share_fd_cpu_()
               ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/build/conda/lib/python3.11/site-packages/torch/storage.py", line 294, in wrapper
    return fn(self, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/build/conda/lib/python3.11/site-packages/torch/storage.py", line 364, in _share_fd_cpu_
    return super()._share_fd_cpu_(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: unable to write to file </torch_1519_3092497780_2>: No space left on device (28)
Traceback (most recent call last):
  File "/home/build/conda/lib/python3.11/multiprocessing/queues.py", line 244, in _feed
    obj = _ForkingPickler.dumps(obj)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/build/conda/lib/python3.11/multiprocessing/reduction.py", line 51, in dumps
    cls(buf, protocol).dump(obj)
  File "/home/build/conda/lib/python3.11/site-packages/torch/multiprocessing/reductions.py", line 568, in reduce_storage
    fd, size = storage._share_fd_cpu_()
               ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/build/conda/lib/python3.11/site-packages/torch/storage.py", line 294, in wrapper
    return fn(self, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/build/conda/lib/python3.11/site-packages/torch/storage.py", line 364, in _share_fd_cpu_
    return super()._share_fd_cpu_(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: unable to write to file </torch_1423_2176948111_0>: No space left on device (28)
Traceback (most recent call last):
  File "/home/build/conda/lib/python3.11/multiprocessing/queues.py", line 244, in _feed
    obj = _ForkingPickler.dumps(obj)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/build/conda/lib/python3.11/multiprocessing/reduction.py", line 51, in dumps
    cls(buf, protocol).dump(obj)
  File "/home/build/conda/lib/python3.11/site-packages/torch/multiprocessing/reductions.py", line 568, in reduce_storage
    fd, size = storage._share_fd_cpu_()
               ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/build/conda/lib/python3.11/site-packages/torch/storage.py", line 294, in wrapper
    return fn(self, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/build/conda/lib/python3.11/site-packages/torch/storage.py", line 364, in _share_fd_cpu_
    return super()._share_fd_cpu_(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: unable to write to file </torch_1503_3380698999_1>: No space left on device (28)
Traceback (most recent call last):
  File "/home/build/conda/lib/python3.11/multiprocessing/queues.py", line 244, in _feed
    obj = _ForkingPickler.dumps(obj)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/build/conda/lib/python3.11/multiprocessing/reduction.py", line 51, in dumps
    cls(buf, protocol).dump(obj)
  File "/home/build/conda/lib/python3.11/site-packages/torch/multiprocessing/reductions.py", line 568, in reduce_storage
    fd, size = storage._share_fd_cpu_()
               ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/build/conda/lib/python3.11/site-packages/torch/storage.py", line 294, in wrapper
    return fn(self, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/build/conda/lib/python3.11/site-packages/torch/storage.py", line 364, in _share_fd_cpu_
    return super()._share_fd_cpu_(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: unable to write to file </torch_1535_2464517252_0>: No space left on device (28)
Traceback (most recent call last):
  File "/home/build/shared/airockchip/welding/yolov5/train.py", line 836, in <module>
    main(opt)
  File "/home/build/shared/airockchip/welding/yolov5/train.py", line 616, in main
    train(opt.hyp, opt, device, callbacks)
  File "/home/build/shared/airockchip/welding/yolov5/train.py", line 272, in train
    val_loader = create_dataloader(
                 ^^^^^^^^^^^^^^^^^^
  File "/home/build/shared/airockchip/welding/yolov5/utils/dataloaders.py", line 177, in create_dataloader
    dataset = LoadImagesAndLabels(
              ^^^^^^^^^^^^^^^^^^^^
  File "/home/build/shared/airockchip/welding/yolov5/utils/dataloaders.py", line 640, in __init__
    if cache_images == "ram" and not self.check_cache_ram(prefix=prefix):
                                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/build/shared/airockchip/welding/yolov5/utils/dataloaders.py", line 664, in check_cache_ram
    im = cv2.imread(random.choice(self.im_files))  # sample image
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/build/shared/airockchip/welding/yolov5/utils/general.py", line 1205, in imread
    return cv2.imdecode(np.fromfile(filename, np.uint8), flags)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/build/conda/lib/python3.11/site-packages/torch/utils/data/_utils/signal_handling.py", line 66, in handler
    _error_if_any_worker_fails()
RuntimeError: DataLoader worker (pid 1455) is killed by signal: Bus error. It is possible that dataloader's workers are out of shared memory. Please try to raise your shared memory limit.

Also, the pre-trained .pt from link shows wrong result as I attached. ![Uploading out.png…]()

Any help? Jagan.

szbela87 commented 7 months ago

Hi Jagan,

thank you, I'll check it. I will provide a feedback by the weekend at the latest.

Best wishes, Bela

Jagan Teki @.***> ezt Γ­rta (idΕ‘pont: 2024. febr. 13., K, 14:54):

Hi,

I'm trying to follow your instruction to train the model for yolov5 but found below please check the same.

$ ls images welding welding_images.zip yolov5 $ rm -rf yolov5/data/images/ $ cp images/ yolov5/data/ -rf $ mkdir yolov5/data/labels $ cp yolov5/data/images/.txt yolov5/data/labels/ $ cp welding/yolov5files/autosplit yolov5/data/ $ cp welding/yolov5_files/welding_data.yaml yolov5/ $ cp welding/yolov5_files/hyp.scratch-* yolov5/data/hyps/

Here, is I'm training.

$ cd yolov5 $ python train.py --cos-lr --img 640 --batch 32 --epochs 200 --data welding_data.yaml --weights yolov5n.pt --project defects --name model_5n_dec4 --cache --freeze 10 2024-02-13 13:46:19.973283: E tensorflow/compiler/xla/stream_executor/cuda/cuda_dnn.cc:9342] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered 2024-02-13 13:46:19.973332: E tensorflow/compiler/xla/stream_executor/cuda/cuda_fft.cc:609] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered 2024-02-13 13:46:19.973363: E tensorflow/compiler/xla/stream_executor/cuda/cuda_blas.cc:1518] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered train: weights=yolov5n.pt, cfg=, data=welding_data.yaml, hyp=data/hyps/hyp.scratch-low.yaml, epochs=200, batch_size=32, imgsz=640, rect=False, resume=False, nosave=False, noval=False, noautoanchor=False, noplots=False, evolve=None, evolve_population=data/hyps, resume_evolve=None, bucket=, cache=ram, image_weights=False, device=, multi_scale=False, single_cls=False, optimizer=SGD, sync_bn=False, workers=8, project=defects, name=model_5n_dec4, exist_ok=False, quad=False, cos_lr=True, label_smoothing=0.0, patience=100, freeze=[10], save_period=-1, seed=0, local_rank=-1, entity=None, upload_dataset=False, bbox_interval=-1, artifact_alias=latest, ndjson_console=False, ndjson_file=False github: up to date with https://github.com/ultralytics/yolov5 βœ… YOLOv5 πŸš€ v7.0-284-g95ebf68f Python-3.11.5 torch-2.2.0+cu121 CPU

hyperparameters: lr0=0.01, lrf=0.1, momentum=0.9, weight_decay=0.0005, warmup_epochs=0.3, warmup_momentum=0.8, warmup_bias_lr=0.1, box=0.05, cls=0.5, cls_pw=1.0, obj=1.0, obj_pw=1.0, iou_t=0.2, anchor_t=4.0, fl_gamma=0.0, hsv_h=0.015, hsv_s=0.7, hsv_v=0.4, degrees=0.0, translate=0.1, scale=0.5, shear=0.0, perspective=0.0, flipud=0.0, fliplr=0.5, mosaic=1.0, mixup=0.0, copy_paste=0.0 Comet: run 'pip install comet_ml' to automatically track and visualize YOLOv5 πŸš€ runs in Comet TensorBoard: Start with 'tensorboard --logdir defects', view at http://localhost:6006/ Downloading https://github.com/ultralytics/yolov5/releases/download/v7.0/yolov5n.pt to yolov5n.pt... 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 3.87M/3.87M [00:00<00:00, 7.28MB/s]

Overriding model.yaml nc=80 with nc=7

             from  n    params  module                                  arguments

0 -1 1 1760 models.common.Conv [3, 16, 6, 2, 2] 1 -1 1 4672 models.common.Conv [16, 32, 3, 2] 2 -1 1 4800 models.common.C3 [32, 32, 1] 3 -1 1 18560 models.common.Conv [32, 64, 3, 2] 4 -1 2 29184 models.common.C3 [64, 64, 2] 5 -1 1 73984 models.common.Conv [64, 128, 3, 2] 6 -1 3 156928 models.common.C3 [128, 128, 3] 7 -1 1 295424 models.common.Conv [128, 256, 3, 2] 8 -1 1 296448 models.common.C3 [256, 256, 1] 9 -1 1 164608 models.common.SPPF [256, 256, 5] 10 -1 1 33024 models.common.Conv [256, 128, 1, 1] 11 -1 1 0 torch.nn.modules.upsampling.Upsample [None, 2, 'nearest'] 12 [-1, 6] 1 0 models.common.Concat [1] 13 -1 1 90880 models.common.C3 [256, 128, 1, False] 14 -1 1 8320 models.common.Conv [128, 64, 1, 1] 15 -1 1 0 torch.nn.modules.upsampling.Upsample [None, 2, 'nearest'] 16 [-1, 4] 1 0 models.common.Concat [1] 17 -1 1 22912 models.common.C3 [128, 64, 1, False] 18 -1 1 36992 models.common.Conv [64, 64, 3, 2] 19 [-1, 14] 1 0 models.common.Concat [1] 20 -1 1 74496 models.common.C3 [128, 128, 1, False] 21 -1 1 147712 models.common.Conv [128, 128, 3, 2] 22 [-1, 10] 1 0 models.common.Concat [1] 23 -1 1 296448 models.common.C3 [256, 256, 1, False] 24 [17, 20, 23] 1 16236 models.yolo.Detect [7, [[10, 13, 16, 30, 33, 23], [30, 61, 62, 45, 59, 119], [116, 90, 156, 198, 373, 326]], [64, 128, 256]] Model summary: 214 layers, 1773388 parameters, 1773388 gradients, 4.3 GFLOPs

Transferred 343/349 items from yolov5n.pt freezing model.0.conv.weight freezing model.0.bn.weight freezing model.0.bn.bias freezing model.1.conv.weight freezing model.1.bn.weight freezing model.1.bn.bias freezing model.2.cv1.conv.weight freezing model.2.cv1.bn.weight freezing model.2.cv1.bn.bias freezing model.2.cv2.conv.weight freezing model.2.cv2.bn.weight freezing model.2.cv2.bn.bias freezing model.2.cv3.conv.weight freezing model.2.cv3.bn.weight freezing model.2.cv3.bn.bias freezing model.2.m.0.cv1.conv.weight freezing model.2.m.0.cv1.bn.weight freezing model.2.m.0.cv1.bn.bias freezing model.2.m.0.cv2.conv.weight freezing model.2.m.0.cv2.bn.weight freezing model.2.m.0.cv2.bn.bias freezing model.3.conv.weight freezing model.3.bn.weight freezing model.3.bn.bias freezing model.4.cv1.conv.weight freezing model.4.cv1.bn.weight freezing model.4.cv1.bn.bias freezing model.4.cv2.conv.weight freezing model.4.cv2.bn.weight freezing model.4.cv2.bn.bias freezing model.4.cv3.conv.weight freezing model.4.cv3.bn.weight freezing model.4.cv3.bn.bias freezing model.4.m.0.cv1.conv.weight freezing model.4.m.0.cv1.bn.weight freezing model.4.m.0.cv1.bn.bias freezing model.4.m.0.cv2.conv.weight freezing model.4.m.0.cv2.bn.weight freezing model.4.m.0.cv2.bn.bias freezing model.4.m.1.cv1.conv.weight freezing model.4.m.1.cv1.bn.weight freezing model.4.m.1.cv1.bn.bias freezing model.4.m.1.cv2.conv.weight freezing model.4.m.1.cv2.bn.weight freezing model.4.m.1.cv2.bn.bias freezing model.5.conv.weight freezing model.5.bn.weight freezing model.5.bn.bias freezing model.6.cv1.conv.weight freezing model.6.cv1.bn.weight freezing model.6.cv1.bn.bias freezing model.6.cv2.conv.weight freezing model.6.cv2.bn.weight freezing model.6.cv2.bn.bias freezing model.6.cv3.conv.weight freezing model.6.cv3.bn.weight freezing model.6.cv3.bn.bias freezing model.6.m.0.cv1.conv.weight freezing model.6.m.0.cv1.bn.weight freezing model.6.m.0.cv1.bn.bias freezing model.6.m.0.cv2.conv.weight freezing model.6.m.0.cv2.bn.weight freezing model.6.m.0.cv2.bn.bias freezing model.6.m.1.cv1.conv.weight freezing model.6.m.1.cv1.bn.weight freezing model.6.m.1.cv1.bn.bias freezing model.6.m.1.cv2.conv.weight freezing model.6.m.1.cv2.bn.weight freezing model.6.m.1.cv2.bn.bias freezing model.6.m.2.cv1.conv.weight freezing model.6.m.2.cv1.bn.weight freezing model.6.m.2.cv1.bn.bias freezing model.6.m.2.cv2.conv.weight freezing model.6.m.2.cv2.bn.weight freezing model.6.m.2.cv2.bn.bias freezing model.7.conv.weight freezing model.7.bn.weight freezing model.7.bn.bias freezing model.8.cv1.conv.weight freezing model.8.cv1.bn.weight freezing model.8.cv1.bn.bias freezing model.8.cv2.conv.weight freezing model.8.cv2.bn.weight freezing model.8.cv2.bn.bias freezing model.8.cv3.conv.weight freezing model.8.cv3.bn.weight freezing model.8.cv3.bn.bias freezing model.8.m.0.cv1.conv.weight freezing model.8.m.0.cv1.bn.weight freezing model.8.m.0.cv1.bn.bias freezing model.8.m.0.cv2.conv.weight freezing model.8.m.0.cv2.bn.weight freezing model.8.m.0.cv2.bn.bias freezing model.9.cv1.conv.weight freezing model.9.cv1.bn.weight freezing model.9.cv1.bn.bias freezing model.9.cv2.conv.weight freezing model.9.cv2.bn.weight freezing model.9.cv2.bn.bias optimizer: SGD(lr=0.01) with parameter groups 57 weight(decay=0.0), 60 weight(decay=0.0005), 60 bias train: Scanning /home/build/shared/airockchip/welding/yolov5/data/autosplit_train... 201 images, 0 backgrounds, 0 corrupt: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 201/201 [00:00<00:00, 5021.02it/s] train: New cache created: /home/build/shared/airockchip/welding/yolov5/data/autosplit_train.cache train: Caching images (0.2GB ram): 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 201/201 [00:05<00:00, 35.55it/s] val: Scanning /home/build/shared/airockchip/welding/yolov5/data/autosplit_val... 56 images, 0 backgrounds, 0 corrupt: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 56/56 [00:00<00:00, 382.35it/s] val: New cache created: /home/build/shared/airockchip/welding/yolov5/data/autosplit_val.cache ERROR: Unexpected bus error encountered in worker. This might be caused by insufficient shared memory (shm). ERROR: Unexpected bus error encountered in worker. This might be caused by insufficient shared memory (shm). ERROR: Unexpected bus error encountered in worker. This might be caused by insufficient shared memory (shm). ERROR: Unexpected bus error encountered in worker. This might be caused by insufficient shared memory (shm). Traceback (most recent call last): File "/home/build/conda/lib/python3.11/multiprocessing/queues.py", line 244, in _feed obj = _ForkingPickler.dumps(obj) ^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/build/conda/lib/python3.11/multiprocessing/reduction.py", line 51, in dumps cls(buf, protocol).dump(obj) File "/home/build/conda/lib/python3.11/site-packages/torch/multiprocessing/reductions.py", line 568, in reduce_storage fd, size = storage._share_fdcpu() ^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/build/conda/lib/python3.11/site-packages/torch/storage.py", line 294, in wrapper return fn(self, *args, kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/build/conda/lib/python3.11/site-packages/torch/storage.py", line 364, in _share_fdcpu return super()._share_fdcpu(*args, *kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ RuntimeError: unable to write to file : No space left on device (28) Traceback (most recent call last): File "/home/build/conda/lib/python3.11/multiprocessing/queues.py", line 244, in _feed obj = _ForkingPickler.dumps(obj) ^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/build/conda/lib/python3.11/multiprocessing/reduction.py", line 51, in dumps cls(buf, protocol).dump(obj) File "/home/build/conda/lib/python3.11/site-packages/torch/multiprocessing/reductions.py", line 568, in reduce_storage fd, size = storage._share_fdcpu() ^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/build/conda/lib/python3.11/site-packages/torch/storage.py", line 294, in wrapper return fn(self, args, kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/build/conda/lib/python3.11/site-packages/torch/storage.py", line 364, in _share_fdcpu return super()._share_fdcpu(*args, kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ RuntimeError: unable to write to file : No space left on device (28) Traceback (most recent call last): File "/home/build/conda/lib/python3.11/multiprocessing/queues.py", line 244, in _feed obj = _ForkingPickler.dumps(obj) ^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/build/conda/lib/python3.11/multiprocessing/reduction.py", line 51, in dumps cls(buf, protocol).dump(obj) File "/home/build/conda/lib/python3.11/site-packages/torch/multiprocessing/reductions.py", line 568, in reduce_storage fd, size = storage._share_fdcpu() ^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/build/conda/lib/python3.11/site-packages/torch/storage.py", line 294, in wrapper return fn(self, *args, *kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/build/conda/lib/python3.11/site-packages/torch/storage.py", line 364, in _share_fdcpu return super()._share_fdcpu(args, kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ RuntimeError: unable to write to file : No space left on device (28) Traceback (most recent call last): File "/home/build/conda/lib/python3.11/multiprocessing/queues.py", line 244, in _feed obj = _ForkingPickler.dumps(obj) ^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/build/conda/lib/python3.11/multiprocessing/reduction.py", line 51, in dumps cls(buf, protocol).dump(obj) File "/home/build/conda/lib/python3.11/site-packages/torch/multiprocessing/reductions.py", line 568, in reduce_storage fd, size = storage._share_fdcpu() ^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/build/conda/lib/python3.11/site-packages/torch/storage.py", line 294, in wrapper return fn(self, *args, kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/build/conda/lib/python3.11/site-packages/torch/storage.py", line 364, in _share_fdcpu return super()._share_fdcpu(*args, *kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ RuntimeError: unable to write to file : No space left on device (28) Traceback (most recent call last): File "/home/build/conda/lib/python3.11/multiprocessing/queues.py", line 244, in _feed obj = _ForkingPickler.dumps(obj) ^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/build/conda/lib/python3.11/multiprocessing/reduction.py", line 51, in dumps cls(buf, protocol).dump(obj) File "/home/build/conda/lib/python3.11/site-packages/torch/multiprocessing/reductions.py", line 568, in reduce_storage fd, size = storage._share_fdcpu() ^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/build/conda/lib/python3.11/site-packages/torch/storage.py", line 294, in wrapper return fn(self, args, kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/build/conda/lib/python3.11/site-packages/torch/storage.py", line 364, in _share_fdcpu return super()._share_fdcpu(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ RuntimeError: unable to write to file : No space left on device (28) Traceback (most recent call last): File "/home/build/shared/airockchip/welding/yolov5/train.py", line 836, in main(opt) File "/home/build/shared/airockchip/welding/yolov5/train.py", line 616, in main train(opt.hyp, opt, device, callbacks) File "/home/build/shared/airockchip/welding/yolov5/train.py", line 272, in train val_loader = create_dataloader( ^^^^^^^^^^^^^^^^^^ File "/home/build/shared/airockchip/welding/yolov5/utils/dataloaders.py", line 177, in create_dataloader dataset = LoadImagesAndLabels( ^^^^^^^^^^^^^^^^^^^^ File "/home/build/shared/airockchip/welding/yolov5/utils/dataloaders.py", line 640, in init if cache_images == "ram" and not self.check_cache_ram(prefix=prefix): ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/build/shared/airockchip/welding/yolov5/utils/dataloaders.py", line 664, in check_cache_ram im = cv2.imread(random.choice(self.im_files)) # sample image ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/build/shared/airockchip/welding/yolov5/utils/general.py", line 1205, in imread return cv2.imdecode(np.fromfile(filename, np.uint8), flags) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/build/conda/lib/python3.11/site-packages/torch/utils/data/_utils/signal_handling.py", line 66, in handler _error_if_any_worker_fails() RuntimeError: DataLoader worker (pid 1455) is killed by signal: Bus error. It is possible that dataloader's workers are out of shared memory. Please try to raise your shared memory limit.

Also, the pre-trained .pt from link https://drive.google.com/drive/folders/1LcIzjeB0gurHA7OGhjajesNJnojtb4EG?usp=sharing shows wrong result as I attached. [image: Uploading out.png…]

Any help? Jagan.

β€” Reply to this email directly, view it on GitHub https://github.com/szbela87/welding/issues/1, or unsubscribe https://github.com/notifications/unsubscribe-auth/ASCWWK3DEZMPA52F6MDYJUDYTNWDDAVCNFSM6AAAAABDGRVDF2VHI2DSMVQWIX3LMV43ASLTON2WKOZSGEZTEMZYGE4DQMI . You are receiving this because you are subscribed to this thread.Message ID: @.***>

openedev commented 7 months ago

Hi Bela,

Thanks for the quick response. Meanwhile, if possible could you send the input image that you tested with default yolov7 pt.

Thanks, Jagan.

szbela87 commented 7 months ago

Hi Jagan,

you're welcome. The full dataset link is available on the github page. This link-> https://drive.google.com/file/d/1GrHhiCdmRnXbXEyWrLGfGGD0eS3YwDUb/view?usp=sharing

BW, Bela

Jagan Teki @.***> ezt Γ­rta (idΕ‘pont: 2024. febr. 13., K, 16:14):

Hi Bela,

Thanks for the quick response. Meanwhile, if possible could you send the input image that you tested with default yolov7 pt https://drive.google.com/drive/folders/1LcIzjeB0gurHA7OGhjajesNJnojtb4EG?usp=sharing .

Thanks, Jagan.

β€” Reply to this email directly, view it on GitHub https://github.com/szbela87/welding/issues/1#issuecomment-1941755910, or unsubscribe https://github.com/notifications/unsubscribe-auth/ASCWWKZUTLMNRFUV3PMMLZDYTN7OXAVCNFSM6AAAAABDGRVDF2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSNBRG42TKOJRGA . You are receiving this because you commented.Message ID: @.***>

openedev commented 7 months ago

Hmm. not sure whats wrong. This is output of yolov7 pt from that drive. Hope you are able to open it?

https://ibb.co/4ZRK6xY

szbela87 commented 7 months ago

It looks like the welding_data.yaml file is not used.

Jagan Teki @.***> ezt Γ­rta (idΕ‘pont: 2024. febr. 13., K, 16:48):

Hmm. not sure whats wrong. This is output of yolov7 pt from that drive. Hope you are able to open it?

https://ibb.co/4ZRK6xY

β€” Reply to this email directly, view it on GitHub https://github.com/szbela87/welding/issues/1#issuecomment-1941840197, or unsubscribe https://github.com/notifications/unsubscribe-auth/ASCWWK7VYJMERDE3O3YJF3TYTODMLAVCNFSM6AAAAABDGRVDF2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSNBRHA2DAMJZG4 . You are receiving this because you commented.Message ID: @.***>

openedev commented 7 months ago

I didn't use your code as it is failing during training so, I did your default pt and convert onnx

$ python export.py --weight ./best.pt

openedev commented 7 months ago

@szbela87 One question the training step you mentioned in README.md create .pt ?

python train.py --cos-lr --img 640 --batch 32 --epochs 200 --data welding_data.yaml --weights yolov5n.pt --project defects --name model_5n_dec4 --cache --freeze 10 
szbela87 commented 7 months ago

Yep. It creates the pt files to the ./defects/model_5n_dec4/weights directory. best.pt are the best weights corresponding to the fitness function. The training starts from the pretrained yolov5n.pt weights (pretrained on the COCO dataset).

Jagan Teki @.***> ezt Γ­rta (idΕ‘pont: 2024. febr. 14., Sze, 11:33):

@szbela87 https://github.com/szbela87 One question the training step you mentioned in README.md create .pt ?

python train.py --cos-lr --img 640 --batch 32 --epochs 200 --data welding_data.yaml --weights yolov5n.pt --project defects --name model_5n_dec4 --cache --freeze 10

β€” Reply to this email directly, view it on GitHub https://github.com/szbela87/welding/issues/1#issuecomment-1943484193, or unsubscribe https://github.com/notifications/unsubscribe-auth/ASCWWK63WXSWA4ZG6Y36XTDYTSHI3AVCNFSM6AAAAABDGRVDF2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSNBTGQ4DIMJZGM . You are receiving this because you were mentioned.Message ID: @.***>

openedev commented 7 months ago

It creates defects/model_5n_dec4 and defects/model_5n_dec42

The 42 creates pt but 4 doesn't create pt which one should I use.

$ ls defects/model_5n_dec4 -l
total 16
-rw-r--r-- 1 build build   88 Feb 14 10:19 events.out.tfevents.1707905986.tops-ThinkPad-E14-Gen-5.485.0
-rw-r--r-- 1 build build  370 Feb 14 10:19 hyp.yaml
-rw-r--r-- 1 build build 1027 Feb 14 10:19 opt.yaml
drwxr-xr-x 2 build build 4096 Feb 14 10:19 weights
$ ls defects/model_5n_dec4/weights/ -l
total 0
$ 

$ ls defects/model_5n_dec42/ -l
total 3324
-rw-r--r-- 1 build build  145283 Feb 14 12:00 confusion_matrix.png
-rw-r--r-- 1 build build 1645571 Feb 14 12:00 events.out.tfevents.1707906465.tops-ThinkPad-E14-Gen-5.16.0
-rw-r--r-- 1 build build  290842 Feb 14 12:00 F1_curve.png
-rw-r--r-- 1 build build     370 Feb 14 10:27 hyp.yaml
-rw-r--r-- 1 build build  194033 Feb 14 10:28 labels_correlogram.jpg
-rw-r--r-- 1 build build  119986 Feb 14 10:28 labels.jpg
-rw-r--r-- 1 build build    1028 Feb 14 10:27 opt.yaml
-rw-r--r-- 1 build build  277956 Feb 14 12:00 P_curve.png
-rw-r--r-- 1 build build  181916 Feb 14 12:00 PR_curve.png
-rw-r--r-- 1 build build  221281 Feb 14 12:00 R_curve.png
-rw-r--r-- 1 build build   59094 Feb 14 12:00 results.csv
-rw-r--r-- 1 build build  228081 Feb 14 12:00 results.png
drwxr-xr-x 2 build build    4096 Feb 14 12:33 weights
$ ls defects/model_5n_dec42/weights/ -l
total 7696
-rw-r--r-- 1 build build 3938664 Feb 14 12:00 best.pt
-rw-r--r-- 1 build build 3938664 Feb 14 12:00 last.pt
szbela87 commented 7 months ago

Yep. If you run it multiple times, it starts numbering them like this, for example as you write dec41 and dec42. So, you're no longer training the model ending with dec4. There are always two weight sets saved during the current training, one is the current state, the other is the best model so far (best.pt). Logically, the latter only changes if you find a better model than before.

Jagan Teki @.***> ezt Γ­rta (idΕ‘pont: 2024. febr. 14., Sze, 13:36):

It creates defects/model_5n_dec4 and defects/model_5n_dec42

The 42 creates pt but 4 doesn't create pt which one should I use.

$ ls defects/model_5n_dec4 -l total 16 -rw-r--r-- 1 build build 88 Feb 14 10:19 events.out.tfevents.1707905986.tops-ThinkPad-E14-Gen-5.485.0 -rw-r--r-- 1 build build 370 Feb 14 10:19 hyp.yaml -rw-r--r-- 1 build build 1027 Feb 14 10:19 opt.yaml drwxr-xr-x 2 build build 4096 Feb 14 10:19 weights $ ls defects/model_5n_dec4/weights/ -l total 0 $

$ ls defects/model_5n_dec42/ -l total 3324 -rw-r--r-- 1 build build 145283 Feb 14 12:00 confusion_matrix.png -rw-r--r-- 1 build build 1645571 Feb 14 12:00 events.out.tfevents.1707906465.tops-ThinkPad-E14-Gen-5.16.0 -rw-r--r-- 1 build build 290842 Feb 14 12:00 F1_curve.png -rw-r--r-- 1 build build 370 Feb 14 10:27 hyp.yaml -rw-r--r-- 1 build build 194033 Feb 14 10:28 labels_correlogram.jpg -rw-r--r-- 1 build build 119986 Feb 14 10:28 labels.jpg -rw-r--r-- 1 build build 1028 Feb 14 10:27 opt.yaml -rw-r--r-- 1 build build 277956 Feb 14 12:00 P_curve.png -rw-r--r-- 1 build build 181916 Feb 14 12:00 PR_curve.png -rw-r--r-- 1 build build 221281 Feb 14 12:00 R_curve.png -rw-r--r-- 1 build build 59094 Feb 14 12:00 results.csv -rw-r--r-- 1 build build 228081 Feb 14 12:00 results.png drwxr-xr-x 2 build build 4096 Feb 14 12:33 weights $ ls defects/model_5n_dec42/weights/ -l total 7696 -rw-r--r-- 1 build build 3938664 Feb 14 12:00 best.pt -rw-r--r-- 1 build build 3938664 Feb 14 12:00 last.pt

β€” Reply to this email directly, view it on GitHub https://github.com/szbela87/welding/issues/1#issuecomment-1943687130, or unsubscribe https://github.com/notifications/unsubscribe-auth/ASCWWKZ6P3RN6FE5EU6VZITYTSVU7AVCNFSM6AAAAABDGRVDF2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSNBTGY4DOMJTGA . You are receiving this because you were mentioned.Message ID: @.***>

openedev commented 7 months ago

Okay. How to run it for output?

szbela87 commented 7 months ago

Do you mean to make predictions? Please see the evaluation scripts and/or the original yolov5 documentation: https://github.com/ultralytics/yolov5

Jagan Teki @.***> ezt Γ­rta (idΕ‘pont: 2024. febr. 14., Sze, 14:46):

Okay. How to run it for output?

β€” Reply to this email directly, view it on GitHub https://github.com/szbela87/welding/issues/1#issuecomment-1943801532, or unsubscribe https://github.com/notifications/unsubscribe-auth/ASCWWK6XKRWOEOWWTZN5MITYTS53JAVCNFSM6AAAAABDGRVDF2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSNBTHAYDCNJTGI . You are receiving this because you were mentioned.Message ID: @.***>

openedev commented 7 months ago

Not prediction, If I give some defected welding image then it should give output image with marking the defected areas like you shows on README.

szbela87 commented 7 months ago

Yeah. This is called prediction... And again: pls look at the evaluation scripts. And also visit the original documentation. Study the github page thoroughly. The original documentation as well.

All the best.

Jagan Teki @.***> ezt Γ­rta (idΕ‘pont: 2024. febr. 14., Sze, 15:01):

Not prediction, If I give some defected welding image then it should give output image with marking the defected areas like you shows on README.

β€” Reply to this email directly, view it on GitHub https://github.com/szbela87/welding/issues/1#issuecomment-1943827150, or unsubscribe https://github.com/notifications/unsubscribe-auth/ASCWWKYT7MHAREOXZYP63BTYTS7S5AVCNFSM6AAAAABDGRVDF2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSNBTHAZDOMJVGA . You are receiving this because you were mentioned.Message ID: @.***>