ocean-data-factory-sweden / kso

Notebooks to upload/download marine footage, connect to a citizen science project, train machine learning models and publish marine biological observations.
GNU General Public License v3.0
5 stars 12 forks source link

Tutorial 5 issue with training model #299

Closed Bergylta closed 11 months ago

Bergylta commented 11 months ago

šŸ› Bug

A clear and concise description of what the bug is.

To Reproduce (REQUIRED)

model: baseline-YOLOV5 batch size: 16 epochs:50 640x640 Input:

mlp.train_yolov5(
    exp_name.value,
    weights.artifact_path,
    mlp.project_name,
    epochs=epochs.value,
    batch_size=batch_size.value,
    img_size=img_h.value,  # this requires an int
)

Output:

train: weights=/mimer/NOBACKUP/groups/snic2021-6-9/tmp_dir/KSO_hardbottom_anemones_24_10_2023/yolov5m.pt, cfg=, data=/mimer/NOBACKUP/groups/snic2021-6-9/tmp_dir/KSO_hardbottom_anemones_24_10_2023/Koster_Seafloor_Obs_12:06:41.yaml, hyp=/mimer/NOBACKUP/groups/snic2021-6-9/tmp_dir/KSO_hardbottom_anemones_24_10_2023/hyp.yaml, epochs=50, batch_size=16, imgsz=640, rect=False, resume=False, nosave=False, noval=False, noautoanchor=False, noplots=False, evolve=None, bucket=, cache=None, image_weights=False, device=, multi_scale=False, single_cls=False, optimizer=SGD, sync_bn=False, workers=8, project=koster_seafloor_obs, name=KSO_hardbottom_anemones_emil7_16_50, exist_ok=False, quad=False, cos_lr=False, label_smoothing=0.0, patience=100, freeze=[0], save_period=-1, seed=0, local_rank=-1, entity=koster, bbox_interval=-1, artifact_alias=latest, neptune_token=None, neptune_project=None, neptune_resume_id=None, s3_upload_dir=None, upload_dataset=True, hf_model_id=None, hf_token=None, hf_private=False, hf_dataset_id=None, roboflow_token=None, roboflow_upload=False, cache_images=True
YOLOv5 šŸš€ 2023-10-23 Python-3.8.10 torch-2.1.0+cu121 CUDA:0 (NVIDIA A40, 45626MiB)

hyperparameters: anchor_t=4.0, box=0.05, cls=0.5, cls_pw=1.0, copy_paste=0.0, degrees=0.0, fl_gamma=0.0, fliplr=0.5, flipud=0.0, hsv_h=0.015, hsv_s=0.7, hsv_v=0.4, iou_t=0.2, lr0=0.01, lrf=0.1, mixup=0.0, momentum=0.937, mosaic=1.0, obj=1.0, obj_pw=1.0, perspective=0.0, scale=0.5, shear=0.0, translate=0.1, warmup_bias_lr=0.1, warmup_epochs=3.0, warmup_momentum=0.8, weight_decay=0.0005
ClearML: run 'pip install clearml' to automatically track, visualize and remotely train YOLOv5 šŸš€ in ClearML
Comet: run 'pip install comet_ml' to automatically track and visualize YOLOv5 šŸš€ runs in Comet
TensorBoard: Start with 'tensorboard --logdir koster_seafloor_obs', view at http://localhost:6006/
requirements: /usr/local/lib/python3.8/dist-packages/requirements.txt not found, check failed.
wandb version 0.15.12 is available! To upgrade, please run: $ pip install wandb --upgrade
Tracking run with wandb version 0.15.11
Run data is saved locally in /mimer/NOBACKUP/groups/snic2021-6-9/wandb/run-20231024_143444-6vosjqk3
Syncing run [KSO_hardbottom_anemones_emil7_16_50](https://wandb.ai/koster/koster_seafloor_obs/runs/6vosjqk3) to [Weights & Biases](https://wandb.ai/koster/koster_seafloor_obs) ([docs](https://wandb.me/run))
View project at https://wandb.ai/koster/koster_seafloor_obs
View run at https://wandb.ai/koster/koster_seafloor_obs/runs/6vosjqk3
Overriding model.yaml nc=80 with nc=3

                 from  n    params  module                                  arguments                     
  0                -1  1      5280  yolov5.models.common.Focus              [3, 48, 3]                    
  1                -1  1     41664  yolov5.models.common.Conv               [48, 96, 3, 2]                
  2                -1  2     65280  yolov5.models.common.C3                 [96, 96, 2]                   
  3                -1  1    166272  yolov5.models.common.Conv               [96, 192, 3, 2]               
  4                -1  6    629760  yolov5.models.common.C3                 [192, 192, 6]                 
  5                -1  1    664320  yolov5.models.common.Conv               [192, 384, 3, 2]              
  6                -1  6   2512896  yolov5.models.common.C3                 [384, 384, 6]                 
  7                -1  1   2655744  yolov5.models.common.Conv               [384, 768, 3, 2]              
  8                -1  1   1476864  yolov5.models.common.SPP                [768, 768, [5, 9, 13]]        
  9                -1  2   4134912  yolov5.models.common.C3                 [768, 768, 2, False]          
 10                -1  1    295680  yolov5.models.common.Conv               [768, 384, 1, 1]              
 11                -1  1         0  torch.nn.modules.upsampling.Upsample    [None, 2, 'nearest']          
 12           [-1, 6]  1         0  yolov5.models.common.Concat             [1]                           
 13                -1  2   1182720  yolov5.models.common.C3                 [768, 384, 2, False]          
 14                -1  1     74112  yolov5.models.common.Conv               [384, 192, 1, 1]              
 15                -1  1         0  torch.nn.modules.upsampling.Upsample    [None, 2, 'nearest']          
 16           [-1, 4]  1         0  yolov5.models.common.Concat             [1]                           
 17                -1  2    296448  yolov5.models.common.C3                 [384, 192, 2, False]          
 18                -1  1    332160  yolov5.models.common.Conv               [192, 192, 3, 2]              
 19          [-1, 14]  1         0  yolov5.models.common.Concat             [1]                           
 20                -1  2   1035264  yolov5.models.common.C3                 [384, 384, 2, False]          
 21                -1  1   1327872  yolov5.models.common.Conv               [384, 384, 3, 2]              
 22          [-1, 10]  1         0  yolov5.models.common.Concat             [1]                           
 23                -1  2   4134912  yolov5.models.common.C3                 [768, 768, 2, False]          
 24      [17, 20, 23]  1     32328  yolov5.models.yolo.Detect               [3, [[10, 13, 16, 30, 33, 23], [30, 61, 62, 45, 59, 119], [116, 90, 156, 198, 373, 326]], [192, 384, 768]]
Model summary: 309 layers, 21064488 parameters, 21064488 gradients

Transferred 499/505 items from /mimer/NOBACKUP/groups/snic2021-6-9/tmp_dir/KSO_hardbottom_anemones_24_10_2023/yolov5m.pt
AMP: checks passed āœ…
optimizer: SGD(lr=0.01) with parameter groups 83 weight(decay=0.0), 86 weight(decay=0.0005), 86 bias
train: Scanning /mimer/NOBACKUP/groups/snic2021-6-9/tmp_dir/KSO_hardbottom_anemones_24_10_2023/train.cache... 372 images, 2 backgrounds, 0 corrupt: 100%|ā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆ| 372/372 [00:00<?, ?it/s]
val: Scanning /mimer/NOBACKUP/groups/snic2021-6-9/tmp_dir/KSO_hardbottom_anemones_24_10_2023/valid.cache... 94 images, 1 backgrounds, 0 corrupt: 100%|ā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆ| 94/94 [00:00<?, ?it/s]

AutoAnchor: 5.53 anchors/target, 1.000 Best Possible Recall (BPR). Current anchors are a good fit to dataset āœ…
Plotting labels to koster_seafloor_obs/KSO_hardbottom_anemones_emil7_16_502/labels.jpg... 
Image sizes 640 train, 640 val
Using 8 dataloader workers
Logging results to koster_seafloor_obs/KSO_hardbottom_anemones_emil7_16_502
Starting training for 50 epochs...

      Epoch    GPU_mem   box_loss   obj_loss   cls_loss  Instances       Size
       0/49      5.97G     0.1117    0.04666    0.03865         77        640:   0%|          | 0/24 [00:00<?, ?it/s]Exception in thread Thread-11:
Traceback (most recent call last):
  File "/usr/lib/python3.8/threading.py", line 932, in _bootstrap_inner
    self.run()
  File "/usr/lib/python3.8/threading.py", line 870, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/local/lib/python3.8/dist-packages/yolov5/utils/plots.py", line 290, in plot_images
    annotator.box_label(box, label, color=color)
  File "/usr/local/lib/python3.8/dist-packages/yolov5/utils/plots.py", line 91, in box_label
    w, h = self.font.getsize(label)  # text width, height (WARNING: deprecated) in 9.2.0
AttributeError: 'FreeTypeFont' object has no attribute 'getsize'
       0/49      5.98G     0.1127    0.05029    0.03864        107        640:   8%|ā–Š         | 2/24 [00:04<00:39,  1.78s/it]Exception in thread Thread-12:
Traceback (most recent call last):
  File "/usr/lib/python3.8/threading.py", line 932, in _bootstrap_inner
    self.run()
  File "/usr/lib/python3.8/threading.py", line 870, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/local/lib/python3.8/dist-packages/yolov5/utils/plots.py", line 290, in plot_images
    annotator.box_label(box, label, color=color)
  File "/usr/local/lib/python3.8/dist-packages/yolov5/utils/plots.py", line 91, in box_label
    w, h = self.font.getsize(label)  # text width, height (WARNING: deprecated) in 9.2.0
AttributeError: 'FreeTypeFont' object has no attribute 'getsize'
       0/49      5.98G      0.113    0.05568    0.03873        147        640:  12%|ā–ˆā–Ž        | 3/24 [00:04<00:22,  1.06s/it]Exception in thread Thread-13:
Traceback (most recent call last):
  File "/usr/lib/python3.8/threading.py", line 932, in _bootstrap_inner
    self.run()
  File "/usr/lib/python3.8/threading.py", line 870, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/local/lib/python3.8/dist-packages/yolov5/utils/plots.py", line 290, in plot_images
    annotator.box_label(box, label, color=color)
  File "/usr/local/lib/python3.8/dist-packages/yolov5/utils/plots.py", line 91, in box_label
    w, h = self.font.getsize(label)  # text width, height (WARNING: deprecated) in 9.2.0
AttributeError: 'FreeTypeFont' object has no attribute 'getsize'
       0/49      5.98G     0.1079    0.06355    0.03801         55        640: 100%|ā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆ| 24/24 [00:08<00:00,  2.85it/s]
                 Class     Images  Instances          P          R      mAP50   mAP50-95: 100%|ā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆ| 3/3 [00:02<00:00,  1.37it/s]
                   all         94        361     0.0023      0.321     0.0104    0.00224
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In[13], line 1
----> 1 mlp.train_yolov5(
      2     exp_name.value,
      3     weights.artifact_path,
      4     mlp.project_name,
      5     epochs=epochs.value,
      6     batch_size=batch_size.value,
      7     img_size=img_h.value,  # this requires an int
      8 )

File /usr/src/app/kso-dev/kso_utils/project.py:1255, in MLProjectProcessor.train_yolov5(self, exp_name, weights, project, epochs, batch_size, img_size)
   1251 def train_yolov5(
   1252     self, exp_name, weights, project, epochs=50, batch_size=16, img_size=[640, 640]
   1253 ):
   1254     if self.model_type == 1:
-> 1255         self.modules["train"].run(
   1256             entity=self.team_name,
   1257             data=self.data_path,
   1258             hyp=self.hyp_path,
   1259             weights=weights,
   1260             project=project,
   1261             name=exp_name,
   1262             imgsz=img_size,
   1263             batch_size=int(batch_size),
   1264             epochs=epochs,
   1265             single_cls=False,
   1266             cache_images=True,
   1267             upload_dataset=True,
   1268         )
   1269     elif self.model_type == 2:
   1270         self.modules["train"].run(
   1271             entity=self.team_name,
   1272             data=self.data_path,
   (...)
   1278             epochs=epochs,
   1279         )

File /usr/local/lib/python3.8/dist-packages/yolov5/train.py:720, in run(**kwargs)
    718 for k, v in kwargs.items():
    719     setattr(opt, k, v)
--> 720 main(opt)
    721 return opt

File /usr/local/lib/python3.8/dist-packages/yolov5/train.py:615, in main(opt, callbacks)
    613 # Train
    614 if not opt.evolve:
--> 615     train(opt.hyp, opt, device, callbacks)
    617 # Evolve hyperparameters (optional)
    618 else:
    619     # Hyperparameter evolution metadata (mutation scale 0-1, lower_limit, upper_limit)
    620     meta = {
    621         'lr0': (1, 1e-5, 1e-1),  # initial learning rate (SGD=1E-2, Adam=1E-3)
    622         'lrf': (1, 0.01, 1.0),  # final OneCycleLR learning rate (lr0 * lrf)
   (...)
    648         'mixup': (1, 0.0, 1.0),  # image mixup (probability)
    649         'copy_paste': (1, 0.0, 1.0)}  # segment copy-paste (probability)

File /usr/local/lib/python3.8/dist-packages/yolov5/train.py:390, in train(hyp, opt, device, callbacks)
    388     best_fitness = fi
    389 log_vals = list(mloss) + list(results) + lr + list(maps) + list(map50s)
--> 390 callbacks.run('on_fit_epoch_end', log_vals, epoch, best_fitness, fi)
    392 # Save model
    393 if (not nosave) or (final_epoch and not evolve):  # if save

File /usr/local/lib/python3.8/dist-packages/yolov5/utils/callbacks.py:76, in Callbacks.run(self, hook, thread, *args, **kwargs)
     74     threading.Thread(target=logger['callback'], args=args, kwargs=kwargs, daemon=True).start()
     75 else:
---> 76     logger['callback'](*args, **kwargs)

File /usr/local/lib/python3.8/dist-packages/yolov5/utils/loggers/__init__.py:266, in Loggers.on_fit_epoch_end(self, vals, epoch, best_fitness, fi)
    264             self.wandb.wandb_run.summary[name] = best_results[i]  # log best results in the summary
    265     self.wandb.log(x)
--> 266     self.wandb.end_epoch(best_result=best_fitness == fi)
    268 if self.neptune and self.neptune.neptune_run:
    269     self.neptune.log(x)

TypeError: end_epoch() got an unexpected keyword argument 'best_result'

Expected behavior.

Environment

Additional context

image

image

image

image