How to train my own data?

wmcnally / kapao

KAPAO is an efficient single-stage human pose estimation model that detects keypoints and poses as objects and fuses the detections to predict human poses.

GNU General Public License v3.0

748 stars 103 forks source link

How to train my own data? #36

Closed Richard-wang85 closed 2 years ago

Richard-wang85 commented 2 years ago

Thank you for your splendid work! But I have some questions about training my own data by using your model. For example, what should I do on my label files(.json)?

wmcnally commented 2 years ago

Are they in COCO format? I.e., can you load them with the pycocotools API? If so, then you just need to create a new dataset config file (e.g., like data/coco-kp.yaml) If not, then you should modify utils/labels.py and val.py to suit your needs.

madenburak commented 2 years ago

Can you explain in details, please? I annotate my own data with coco annotator, i create new config file as coco-kp.yaml and yolov5s6.yaml but don't create kp_labels.

Traceback (most recent call last): File "train.py", line 601, in <module> main(opt) File "train.py", line 499, in main train(opt.hyp, opt, device) File "train.py", line 210, in train prefix=colorstr('train: '), kp_flip=kp_flip, kp_bbox=kp_bbox) File "/home/kapao-master/utils/datasets.py", line 111, in create_dataloader kp_bbox=kp_bbox) File "/home/kapao-master/utils/datasets.py", line 423, in __init__ raise Exception(f'{prefix}Error loading data from {path}: {e}\nSee {HELP_URL}') Exception: train: Error loading data from /home/kapao-master/data/datasets/airplane/kp_labels/1: train: /home/kapao-master/data/datasets/airplane/kp_labels/1 does not exist See https://github.com/ultralytics/yolov5/wiki/Train-Custom-Data

wmcnally commented 2 years ago

Can you share your config file and your dataset file structure please.

madenburak commented 2 years ago

Thanks for reply my question. I tried creating my dataset like coco.

path:/home/kapao-master/data/datasets/airplane/
labels: kp_labels
train: kp_labels/train2017
val: kp_labels/validation2017
test: kp_labels/test2017

train_annotations: annotations/person_keypoints_train2017.json
val_annotations: annotations/person_keypoints_val2017.json
test_annotations: annotations/image_info_test-dev2017.json

pose_obj: True

nc: 5  
num_coords: 8  

names: [ 'airplane', 'nose',
         'left_wing', 'right_wing',
         'tail' ]

kp_bbox: 0.05 
kp_flip: [0, 2, 1, 3]  
kp_left: [1]  
kp_face: [0]

kp_names_short:
  0: 'n'
  1: 'lew'
  2: 'rew'
  3: 'tai'

segments:
  1: [0, 1]
  2: [0, 2]
  3: [1, 3]
  4: [2, 3]

airplane ┣ annotations ┃ ┣ image_info_test-dev2017.json ┃ ┣ person_keypoints_train2017.json ┃ ┗ person_keypoints_val2017.json ┣ kp_labels [don't create] ┃ ┣ img_txt ┃ ┃ ┣ 3.txt ┃ ┃ ┣ train2017.txt ┃ ┃ ┗ validation2017.txt ┃ ┣ hpron190000.txt ┃ ┣ hpron190700.txt ┃ ┣ jpron1400.txt ┃ ┣ jpron1500.txt ┃ ┗ jpron1600.txt ┗ labels ┃ ┣ test2017 ┃ ┃ ┗ bpron41400.jpg ┃ ┣ train2017 ┃ ┃ ┣ jpron1400.jpg ┃ ┃ ┣ jpron1500.jpg ┃ ┃ ┗ jpron1600.jpg ┃ ┗ validation2017 ┃ ┃ ┣ hpron190000.jpg ┃ ┃ ┗ hpron190700.jpg

wmcnally commented 2 years ago

Interesting application. I'm assuming your jsons contain your custom labels in the COCO format?

Try pulling the code, delete kp_labels directory, rename labels directory to images, and update config as per below.

train: kp_labels/img_txt/train2017.txt
val: kp_labels/img_txt/validation2017.txt
test: kp_labels/img_txt/test2017.txt

madenburak commented 2 years ago

I getting this error message.

train: weights=yolov5s.pt, cfg=yolov5s6.yaml, data=coco-kp-plane.yaml, hyp=data/hyps/hyp.kp-p6.yaml, epochs=300, batch_size=16, imgsz=640, rect=False, resume=False, nosave=False, noval=False, noautoanchor=False, evolve=None, bucket=, cache=None, image_weights=False, device=, multi_scale=False, single_cls=False, adam=False, sync_bn=False, workers=8, project=runs/train, entity=None, name=exp, exist_ok=False, quad=False, linear_lr=False, label_smoothing=0.0, upload_dataset=False, bbox_interval=-1, save_period=-1, artifact_alias=latest, local_rank=-1, freeze=0, patience=100, val_scales=[1], val_flips=[-1], autobalance=False
YOLOv5 🚀 46f41ac torch 1.9.1+cu102 CUDA:0 (Quadro RTX 6000, 24211.9375MB)

hyperparameters: lr0=0.01, lrf=0.2, momentum=0.937, weight_decay=0.0005, warmup_epochs=3.0, warmup_momentum=0.8, warmup_bias_lr=0.1, box=0.05, cls=0.3, cls_pw=1.0, obj=0.7, obj_pw=1.0, kp=0.025, iou_t=0.2, anchor_t=4.0, fl_gamma=0.0, hsv_h=0.015, hsv_s=0.7, hsv_v=0.4, degrees=0.0, translate=0.1, scale=0.9, shear=0.0, perspective=0.0, flipud=0.0, fliplr=0.5, mosaic=1.0, mixup=0.0, copy_paste=0.0, kp_bbox=0.05
Weights & Biases: run 'pip install wandb' to automatically track and visualize YOLOv5 🚀 runs (RECOMMENDED)
TensorBoard: Start with 'tensorboard --logdir runs/train', view at http://localhost:6006/
Writing dataset labels to /home/kapao/data/datasets/airplane/kp_labels...
loading annotations into memory...
Done (t=0.00s)
creating index...
index created!
Writing train2017 labels to /home/kapao/data/datasets/airplane/kp_labels/train2017: 100%|██████████████████████████████████████████████████████████| 3/3 [00:00<00:00, 8630.26it/s]
loading annotations into memory...
Done (t=0.00s)
creating index...
index created!
Writing validation2017 labels to /home/kapao/data/datasets/airplane/kp_labels/validation2017: 100%|████████████████████████████████████████████████| 2/2 [00:00<00:00, 6528.10it/s]
loading annotations into memory...
Done (t=0.00s)
creating index...
index created!
Overriding model.yaml nc=5 with nc=13

                 from  n    params  module                                  arguments                     
  0                -1  1      3520  models.common.Focus                     [3, 32, 3]                    
  1                -1  1     18560  models.common.Conv                      [32, 64, 3, 2]                
  2                -1  1     18816  models.common.C3                        [64, 64, 1]                   
  3                -1  1     73984  models.common.Conv                      [64, 128, 3, 2]               
  4                -1  3    156928  models.common.C3                        [128, 128, 3]                 
  5                -1  1    295424  models.common.Conv                      [128, 256, 3, 2]              
  6                -1  3    625152  models.common.C3                        [256, 256, 3]                 
  7                -1  1    885504  models.common.Conv                      [256, 384, 3, 2]              
  8                -1  1    665856  models.common.C3                        [384, 384, 1]                 
  9                -1  1   1770496  models.common.Conv                      [384, 512, 3, 2]              
 10                -1  1    656896  models.common.SPP                       [512, 512, [3, 5, 7]]         
 11                -1  1   1182720  models.common.C3                        [512, 512, 1, False]          
 12                -1  1    197376  models.common.Conv                      [512, 384, 1, 1]              
 13                -1  1         0  torch.nn.modules.upsampling.Upsample    [None, 2, 'nearest']          
 14           [-1, 8]  1         0  models.common.Concat                    [1]                           
 15                -1  1    813312  models.common.C3                        [768, 384, 1, False]          
 16                -1  1     98816  models.common.Conv                      [384, 256, 1, 1]              
 17                -1  1         0  torch.nn.modules.upsampling.Upsample    [None, 2, 'nearest']          
 18           [-1, 6]  1         0  models.common.Concat                    [1]                           
 19                -1  1    361984  models.common.C3                        [512, 256, 1, False]          
 20                -1  1     33024  models.common.Conv                      [256, 128, 1, 1]              
 21                -1  1         0  torch.nn.modules.upsampling.Upsample    [None, 2, 'nearest']          
 22           [-1, 4]  1         0  models.common.Concat                    [1]                           
 23                -1  1     90880  models.common.C3                        [256, 128, 1, False]          
 24                -1  1    147712  models.common.Conv                      [128, 128, 3, 2]              
 25          [-1, 20]  1         0  models.common.Concat                    [1]                           
 26                -1  1    296448  models.common.C3                        [256, 256, 1, False]          
 27                -1  1    590336  models.common.Conv                      [256, 256, 3, 2]              
 28          [-1, 16]  1         0  models.common.Concat                    [1]                           
 29                -1  1    715008  models.common.C3                        [512, 384, 1, False]          
 30                -1  1   1327872  models.common.Conv                      [384, 384, 3, 2]              
 31          [-1, 12]  1         0  models.common.Concat                    [1]                           
 32                -1  1   1313792  models.common.C3                        [768, 512, 1, False]          
 33  [23, 26, 29, 32]  1     69336  models.yolo.Detect                      [13, [[19, 27, 44, 40, 38, 94], [96, 68, 86, 152, 180, 137], [140, 301, 303, 264, 238, 542], [436, 615, 739, 380, 925, 792]], [128, 256, 384, 512]]
Model Summary: 368 layers, 12409752 parameters, 12409752 gradients, 16.8 GFLOPs

Transferred 155/472 items from yolov5s.pt
Scaled weight_decay = 0.0005
optimizer: SGD with parameter groups 77 weight, 81 weight (no decay), 81 bias
train: Scanning '/home/kapao/data/datasets/airplane/kp_labels/img_txt/train2017' images and labels...3 found, 0 missing, 0 empty, 0 corrupted: 100%|█| 3/3 [00:00<00:00, 535.22it/s
train: New cache created: /home/kapao/data/datasets/airplane/kp_labels/img_txt/train2017.cache
val: Scanning '/home/kapao/data/datasets/airplane/kp_labels/img_txt/validation2017' images and labels...2 found, 0 missing, 0 empty, 0 corrupted: 100%|█| 2/2 [00:00<00:00, 383.16i
val: New cache created: /home/kapao/data/datasets/airplane/kp_labels/img_txt/validation2017.cache

autoanchor: Analyzing anchors... anchors/target = 3.89, Best Possible Recall (BPR) = 1.0000
Image sizes 640 train, 640 val
Using 3 dataloader workers
Logging results to runs/train/exp3
Starting training for 300 epochs...

     Epoch   gpu_mem       box       obj       cls       kps    labels  img_size
     0/299    0.715G    0.0883   0.01417   0.02321    0.2518         4       640: 100%|██████████████████████████████████████████████████████████████████████████████| 1/1 [00:01<00:00,  1.66s/it]
/home/anaconda3/envs/kapao/lib/python3.6/site-packages/torch/optim/lr_scheduler.py:134: UserWarning: Detected call of `lr_scheduler.step()` before `optimizer.step()`. In PyTorch 1.1.0 and later, you should call them in the opposite order: `optimizer.step()` before `lr_scheduler.step()`.  Failure to do this will result in PyTorch skipping the first value of the learning rate schedule. See more details at https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate
  "https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate", UserWarning)
Processing val images: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 22.71it/s]
loading annotations into memory...
Done (t=0.00s)
creating index...
index created!
Loading and preparing results...
Traceback (most recent call last):
  File "train.py", line 601, in <module>
    main(opt)
  File "train.py", line 499, in main
    train(opt.hyp, opt, device)
  File "train.py", line 360, in train
    flips=val_flips)
  File "/home/anaconda3/envs/kapao/lib/python3.6/site-packages/torch/autograd/grad_mode.py", line 28, in decorate_context
    return func(*args, **kwargs)
  File "/home/kapao/val.py", line 283, in run
    result = coco.loadRes(json_path)
  File "/home/.local/lib/python3.6/site-packages/crowdposetools/coco.py", line 265, in loadRes
    if 'caption' in anns[0]:
IndexError: list index out of range

wmcnally commented 2 years ago

Can you pull and try again please?

madenburak commented 2 years ago

I deleted all files of kapao and new created environment on anaconda. I cloned kapao again. I rearranged .yaml, yolov5s6.yaml and dataset. But I getting same error.

Structure of created kp_labels:

kp_labels ┣ img_txt ┃ ┣ test2017.txt ┃ ┣ train2017.cache ┃ ┣ train2017.txt ┃ ┣ val2017.cache ┃ ┗ val2017.txt ┣ train2017 ┃ ┣ jpron1400.txt ┃ ┣ jpron1500.txt ┃ ┗ jpron1600.txt ┗ val2017 ┃ ┣ hpron190000.txt ┃ ┗ hpron190700.txt

And inside my .json:

{"images":[{"id":1133,"dataset_id":10,"category_ids":[10],"path":"/home/kapao/data/dataset/plane/images/validation2017/hpron190000.jpg","width":1920,"height":1080,"file_name":"hpron190000.jpg","annotated":true,"annotating":[],"num_annotations":1,"metadata":{},"milliseconds":0,"events":[],"regenerate_thumbnail":false,"is_modified":false},{"id":1134,"dataset_id":10,"category_ids":[10],"path":"/home/kapao/data/dataset/plane/images/validation2017/hpron190700.jpg","width":1920,"height":1080,"file_name":"hpron190700.jpg","annotated":true,"annotating":[],"num_annotations":1,"metadata":{},"milliseconds":0,"events":[],"regenerate_thumbnail":false,"is_modified":false}],"categories":[{"id":10,"name":"airplane","supercategory":"airplane","color":"#bda513","metadata":{},"creator":"burak.m","keypoint_colors":["#bf5c4d","#d99100","#4d8068","#0d2b80"],"keypoints":["nose","left_wing","tail","right_wing"],"skeleton":[[1,4],[1,2],[2,3],[3,4]]}],"annotations":[{"id":190,"image_id":1133,"category_id":10,"dataset_id":10,"segmentation":[[1349.9,387.5,1349.9,461,1065.7,461,1065.7,387.5]],"area":20732,"bbox":[1066,388,284,73],"iscrowd":false,"isbbox":true,"creator":"burak.m","width":1920,"height":1080,"color":"#a60d4d","keypoints":[1091,437,2,0,0,0,1303,401,2,0,0,0],"metadata":{},"milliseconds":9684,"events":[{"_cls":"SessionEvent","created_at":{"$date":1644910240771},"user":"burak.m","milliseconds":9684,"tools_used":["BBox","Keypoints"]}],"num_keypoints":2},{"id":191,"image_id":1134,"category_id":10,"dataset_id":10,"segmentation":[[601.4,398.6,601.4,725.7,12.3,725.7,12.3,398.6]],"area":192603,"bbox":[12,399,589,327],"iscrowd":false,"isbbox":true,"creator":"burak.m","width":1920,"height":1080,"color":"#66c2f4","keypoints":[431,687,2,576,559,2,270,496,2,35,605,2],"metadata":{},"milliseconds":13192,"events":[{"_cls":"SessionEvent","created_at":{"$date":1644910251688},"user":"burak.m","milliseconds":13192,"tools_used":["BBox","Keypoints","Select"]}],"num_keypoints":4}]}

madenburak commented 2 years ago

What is problem? Annotations, is it possible?

wmcnally commented 2 years ago

It wasn’t the same error. Look at the error:

[Errno 2] No such file or directory: 'data/datasets/plane/images/val2017/hpron190000.jpg'

wmcnally commented 2 years ago

If you send me a zip file with your images and annotations to wmcnally@uwaterloo.ca I can take a look at it.

madenburak commented 2 years ago

I noticed that error and fixed it later. The current error is the same as yesterday.

train: weights=yolov5s.pt, cfg=yolov5s6-plane.yaml, data=own.yaml, hyp=data/hyps/hyp.kp-p6.yaml, epochs=300, batch_size=1, imgsz=640, rect=False, resume=False, nosave=False, noval=False, noautoanchor=False, evolve=None, bucket=, cache=None, image_weights=False, device=, multi_scale=False, single_cls=False, adam=False, sync_bn=False, workers=8, project=runs/train, entity=None, name=exp, exist_ok=False, quad=False, linear_lr=False, label_smoothing=0.0, upload_dataset=False, bbox_interval=-1, save_period=-1, artifact_alias=latest, local_rank=-1, freeze=0, patience=100, val_scales=[1], val_flips=[-1], autobalance=False
YOLOv5 🚀 156755c torch 1.9.1+cu102 CUDA:0 (Quadro RTX 6000, 24211.9375MB)

hyperparameters: lr0=0.01, lrf=0.2, momentum=0.937, weight_decay=0.0005, warmup_epochs=3.0, warmup_momentum=0.8, warmup_bias_lr=0.1, box=0.05, cls=0.3, cls_pw=1.0, obj=0.7, obj_pw=1.0, kp=0.025, iou_t=0.2, anchor_t=4.0, fl_gamma=0.0, hsv_h=0.015, hsv_s=0.7, hsv_v=0.4, degrees=0.0, translate=0.1, scale=0.9, shear=0.0, perspective=0.0, flipud=0.0, fliplr=0.5, mosaic=1.0, mixup=0.0, copy_paste=0.0, kp_bbox=0.05
Weights & Biases: run 'pip install wandb' to automatically track and visualize YOLOv5 🚀 runs (RECOMMENDED)
TensorBoard: Start with 'tensorboard --logdir runs/train', view at http://localhost:6006/
Overriding model.yaml nc=5 with nc=13

                 from  n    params  module                                  arguments                     
  0                -1  1      3520  models.common.Focus                     [3, 32, 3]                    
  1                -1  1     18560  models.common.Conv                      [32, 64, 3, 2]                
  2                -1  1     18816  models.common.C3                        [64, 64, 1]                   
  3                -1  1     73984  models.common.Conv                      [64, 128, 3, 2]               
  4                -1  3    156928  models.common.C3                        [128, 128, 3]                 
  5                -1  1    295424  models.common.Conv                      [128, 256, 3, 2]              
  6                -1  3    625152  models.common.C3                        [256, 256, 3]                 
  7                -1  1    885504  models.common.Conv                      [256, 384, 3, 2]              
  8                -1  1    665856  models.common.C3                        [384, 384, 1]                 
  9                -1  1   1770496  models.common.Conv                      [384, 512, 3, 2]              
 10                -1  1    656896  models.common.SPP                       [512, 512, [3, 5, 7]]         
 11                -1  1   1182720  models.common.C3                        [512, 512, 1, False]          
 12                -1  1    197376  models.common.Conv                      [512, 384, 1, 1]              
 13                -1  1         0  torch.nn.modules.upsampling.Upsample    [None, 2, 'nearest']          
 14           [-1, 8]  1         0  models.common.Concat                    [1]                           
 15                -1  1    813312  models.common.C3                        [768, 384, 1, False]          
 16                -1  1     98816  models.common.Conv                      [384, 256, 1, 1]              
 17                -1  1         0  torch.nn.modules.upsampling.Upsample    [None, 2, 'nearest']          
 18           [-1, 6]  1         0  models.common.Concat                    [1]                           
 19                -1  1    361984  models.common.C3                        [512, 256, 1, False]          
 20                -1  1     33024  models.common.Conv                      [256, 128, 1, 1]              
 21                -1  1         0  torch.nn.modules.upsampling.Upsample    [None, 2, 'nearest']          
 22           [-1, 4]  1         0  models.common.Concat                    [1]                           
 23                -1  1     90880  models.common.C3                        [256, 128, 1, False]          
 24                -1  1    147712  models.common.Conv                      [128, 128, 3, 2]              
 25          [-1, 20]  1         0  models.common.Concat                    [1]                           
 26                -1  1    296448  models.common.C3                        [256, 256, 1, False]          
 27                -1  1    590336  models.common.Conv                      [256, 256, 3, 2]              
 28          [-1, 16]  1         0  models.common.Concat                    [1]                           
 29                -1  1    715008  models.common.C3                        [512, 384, 1, False]          
 30                -1  1   1327872  models.common.Conv                      [384, 384, 3, 2]              
 31          [-1, 12]  1         0  models.common.Concat                    [1]                           
 32                -1  1   1313792  models.common.C3                        [768, 512, 1, False]          
 33  [23, 26, 29, 32]  1     69336  models.yolo.Detect                      [13, [[19, 27, 44, 40, 38, 94], [96, 68, 86, 152, 180, 137], [140, 301, 303, 264, 238, 542], [436, 615, 739, 380, 925, 792]], [128, 256, 384, 512]]
Model Summary: 368 layers, 12409752 parameters, 12409752 gradients, 16.8 GFLOPs

Transferred 155/472 items from yolov5s.pt
Scaled weight_decay = 0.0005
optimizer: SGD with parameter groups 77 weight, 81 weight (no decay), 81 bias
train: Scanning 'data/datasets/plane/kp_labels/img_txt/train2017.cache' images and labels... 3 found, 0 missing, 0 empty, 0 corrupted: 100%|████████████| 3/3 [00:00<?, ?it/s]
val: Scanning 'data/datasets/plane/kp_labels/img_txt/val2017.cache' images and labels... 2 found, 0 missing, 0 empty, 0 corrupted: 100%|████████████████| 2/2 [00:00<?, ?it/s]

autoanchor: Analyzing anchors... anchors/target = 3.89, Best Possible Recall (BPR) = 1.0000
Image sizes 640 train, 640 val
Using 0 dataloader workers
Logging results to runs/train/exp59
Starting training for 300 epochs...

     Epoch   gpu_mem       box       obj       cls       kps    labels  img_size
     0/299    0.593G   0.05297   0.02707   0.01483    0.2495        10       640: 100%|█████████████████████████████████████████████████████████| 3/3 [00:02<00:00,  1.46it/s]
Processing val images: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 27.15it/s]
loading annotations into memory...
Done (t=0.00s)
creating index...
index created!
Loading and preparing results...
Traceback (most recent call last):
  File "train.py", line 601, in <module>
    main(opt)
  File "train.py", line 499, in main
    train(opt.hyp, opt, device)
  File "train.py", line 360, in train
    flips=val_flips)
  File "/home/anaconda3/envs/kapao/lib/python3.6/site-packages/torch/autograd/grad_mode.py", line 28, in decorate_context
    return func(*args, **kwargs)
  File "/home/kapao/val.py", line 283, in run
    result = coco.loadRes(json_path)
  File "/home/.local/lib/python3.6/site-packages/crowdposetools/coco.py", line 265, in loadRes
    if 'caption' in anns[0]:
IndexError: list index out of range

madenburak commented 2 years ago

Ok. I will send zip.

wmcnally commented 2 years ago

You need to change the filenames to the image ids (for the image files as well as in the annotation jsons). This needs to be done at least for the validation / test images. After I did that it worked for me.

I also added the option to adjust the OKS sigmas in the config file with: oks_sigmas: [.025, .025, .025, .025]

For reference, 0.025 is equal to the sigma for the eyes in the COCO dataset.

I will email you back the files with the modifications. Please pull the code and try again after changing the filenames in the jsons and the images folder.

madenburak commented 2 years ago

I downloaded your data.zip that you sending and I run train.py with it.

My train command:

python train.py --img 1280 --batch 48 --epochs 300 --data data/bird.yaml --hyp data/hyps/hyp.kp-p6.yaml --val-scales 1 --val-flips -1 --weights yolov5s6.pt --project runs/bird_e300 --name train --workers 12

Error that I getting:

train: weights=yolov5s6.pt, cfg=, data=data/bird.yaml, hyp=data/hyps/hyp.kp-p6.yaml, epochs=300, batch_size=48, imgsz=1280, rect=False, resume=False, nosave=False, noval=False, noautoanchor=False, evolve=None, bucket=, cache=None, image_weights=False, device=, multi_scale=False, single_cls=False, adam=False, sync_bn=False, workers=12, project=runs/bird_e300, entity=None, name=train, exist_ok=False, quad=False, linear_lr=False, label_smoothing=0.0, upload_dataset=False, bbox_interval=-1, save_period=-1, artifact_alias=latest, local_rank=-1, freeze=0, patience=100, val_scales=[1.0], val_flips=[-1], autobalance=False
YOLOv5 🚀 156755c torch 1.9.1+cu102 CUDA:0 (Quadro RTX 6000, 24211.9375MB)

hyperparameters: lr0=0.01, lrf=0.2, momentum=0.937, weight_decay=0.0005, warmup_epochs=3.0, warmup_momentum=0.8, warmup_bias_lr=0.1, box=0.05, cls=0.3, cls_pw=1.0, obj=0.7, obj_pw=1.0, kp=0.025, iou_t=0.2, anchor_t=4.0, fl_gamma=0.0, hsv_h=0.015, hsv_s=0.7, hsv_v=0.4, degrees=0.0, translate=0.1, scale=0.9, shear=0.0, perspective=0.0, flipud=0.0, fliplr=0.5, mosaic=1.0, mixup=0.0, copy_paste=0.0, kp_bbox=0.05
Weights & Biases: run 'pip install wandb' to automatically track and visualize YOLOv5 🚀 runs (RECOMMENDED)
TensorBoard: Start with 'tensorboard --logdir runs/bird_e300', view at http://localhost:6006/
Overriding model.yaml nc=80 with nc=13

                 from  n    params  module                                  arguments                     
  0                -1  1      3520  models.common.Focus                     [3, 32, 3]                    
  1                -1  1     18560  models.common.Conv                      [32, 64, 3, 2]                
  2                -1  1     18816  models.common.C3                        [64, 64, 1]                   
  3                -1  1     73984  models.common.Conv                      [64, 128, 3, 2]               
  4                -1  3    156928  models.common.C3                        [128, 128, 3]                 
  5                -1  1    295424  models.common.Conv                      [128, 256, 3, 2]              
  6                -1  3    625152  models.common.C3                        [256, 256, 3]                 
  7                -1  1    885504  models.common.Conv                      [256, 384, 3, 2]              
  8                -1  1    665856  models.common.C3                        [384, 384, 1]                 
  9                -1  1   1770496  models.common.Conv                      [384, 512, 3, 2]              
 10                -1  1    656896  models.common.SPP                       [512, 512, [3, 5, 7]]         
 11                -1  1   1182720  models.common.C3                        [512, 512, 1, False]          
 12                -1  1    197376  models.common.Conv                      [512, 384, 1, 1]              
 13                -1  1         0  torch.nn.modules.upsampling.Upsample    [None, 2, 'nearest']          
 14           [-1, 8]  1         0  models.common.Concat                    [1]                           
 15                -1  1    813312  models.common.C3                        [768, 384, 1, False]          
 16                -1  1     98816  models.common.Conv                      [384, 256, 1, 1]              
 17                -1  1         0  torch.nn.modules.upsampling.Upsample    [None, 2, 'nearest']          
 18           [-1, 6]  1         0  models.common.Concat                    [1]                           
 19                -1  1    361984  models.common.C3                        [512, 256, 1, False]          
 20                -1  1     33024  models.common.Conv                      [256, 128, 1, 1]              
 21                -1  1         0  torch.nn.modules.upsampling.Upsample    [None, 2, 'nearest']          
 22           [-1, 4]  1         0  models.common.Concat                    [1]                           
 23                -1  1     90880  models.common.C3                        [256, 128, 1, False]          
 24                -1  1    147712  models.common.Conv                      [128, 128, 3, 2]              
 25          [-1, 20]  1         0  models.common.Concat                    [1]                           
 26                -1  1    296448  models.common.C3                        [256, 256, 1, False]          
 27                -1  1    590336  models.common.Conv                      [256, 256, 3, 2]              
 28          [-1, 16]  1         0  models.common.Concat                    [1]                           
 29                -1  1    715008  models.common.C3                        [512, 384, 1, False]          
 30                -1  1   1327872  models.common.Conv                      [384, 384, 3, 2]              
 31          [-1, 12]  1         0  models.common.Concat                    [1]                           
 32                -1  1   1313792  models.common.C3                        [768, 512, 1, False]          
 33  [23, 26, 29, 32]  1     69336  models.yolo.Detect                      [13, [[19, 27, 44, 40, 38, 94], [96, 68, 86, 152, 180, 137], [140, 301, 303, 264, 238, 542], [436, 615, 739, 380, 925, 792]], [128, 256, 384, 512]]
Model Summary: 368 layers, 12409752 parameters, 12409752 gradients, 16.8 GFLOPs

Transferred 464/472 items from yolov5s6.pt
Scaled weight_decay = 0.000375
optimizer: SGD with parameter groups 77 weight, 81 weight (no decay), 81 bias
train: Scanning 'data/datasets/bird/kp_labels/img_txt/train2017.cache' images and labels... 3 found, 0 missing, 0 empty, 0 corrupted: 100%|█████████████| 3/3 [00:00<?, ?it/s]
val: Scanning 'data/datasets/bird/kp_labels/img_txt/val2017.cache' images and labels... 2 found, 0 missing, 0 empty, 0 corrupted: 100%|█████████████████| 2/2 [00:00<?, ?it/s]

autoanchor: Analyzing anchors... anchors/target = 5.67, Best Possible Recall (BPR) = 1.0000
Image sizes 1280 train, 1280 val
Using 3 dataloader workers
Logging results to runs/bird_e300/train3
Starting training for 300 epochs...

     Epoch   gpu_mem       box       obj       cls       kps    labels  img_size
     0/299     2.66G    0.1117   0.07986   0.03295    0.3015        11      1280: 100%|█████████████████████████████████████████████████████████| 1/1 [00:01<00:00,  1.93s/it]
/home/anaconda3/envs/kapao/lib/python3.6/site-packages/torch/optim/lr_scheduler.py:134: UserWarning: Detected call of `lr_scheduler.step()` before `optimizer.step()`. In PyTorch 1.1.0 and later, you should call them in the opposite order: `optimizer.step()` before `lr_scheduler.step()`.  Failure to do this will result in PyTorch skipping the first value of the learning rate schedule. See more details at https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate
  "https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate", UserWarning)
Processing val images: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 13.16it/s]
loading annotations into memory...
Done (t=0.00s)
creating index...
index created!
Loading and preparing results...
DONE (t=0.06s)
creating index...
index created!
Running per image evaluation...
Evaluate annotation type *keypoints*
DONE (t=0.00s).
Accumulating evaluation results...
DONE (t=0.00s).
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets= 20 ] = 0.000
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets= 20 ] = 0.000
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets= 20 ] = 0.000
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 20 ] = 0.000
 Average Recall     (AR) @[ IoU=0.50      | area=   all | maxDets= 20 ] = 0.000
 Average Recall     (AR) @[ IoU=0.75      | area=   all | maxDets= 20 ] = 0.000
Traceback (most recent call last):
  File "train.py", line 601, in <module>
    main(opt)
  File "train.py", line 499, in main
    train(opt.hyp, opt, device)
  File "train.py", line 360, in train
    flips=val_flips)
  File "/home/anaconda3/envs/kapao/lib/python3.6/site-packages/torch/autograd/grad_mode.py", line 28, in decorate_context
    return func(*args, **kwargs)
  File "/home/kapao/val.py", line 287, in run
    eval.summarize()
  File "/home/.local/lib/python3.6/site-packages/crowdposetools/cocoeval.py", line 556, in summarize
    self.stats = summarize()
  File "/home/.local/lib/python3.6/site-packages/crowdposetools/cocoeval.py", line 541, in _summarizeKps
    type_result = self.get_type_result(first=0.2, second=0.8)
  File "/home/.local/lib/python3.6/site-packages/crowdposetools/cocoeval.py", line 563, in get_type_result
    easy, mid, hard = self.split(gt_file, first, second)
  File "/home/.local/lib/python3.6/site-packages/crowdposetools/cocoeval.py", line 588, in split
    if item['crowdIndex'] < first:
KeyError: 'crowdIndex'

madenburak commented 2 years ago

Also, I try again after changing the filenames in the jsons and the images folder. But i getting error that same as yesterday.

Traceback (most recent call last):
  File "train.py", line 601, in <module>
    main(opt)
  File "train.py", line 499, in main
    train(opt.hyp, opt, device)
  File "train.py", line 360, in train
    flips=val_flips)
  File "/home/anaconda3/envs/kapao/lib/python3.6/site-packages/torch/autograd/grad_mode.py", line 28, in decorate_context
    return func(*args, **kwargs)
  File "/home/kapao/val.py", line 283, in run
    result = coco.loadRes(json_path)
  File "/home/.local/lib/python3.6/site-packages/crowdposetools/coco.py", line 265, in loadRes
    if 'caption' in anns[0]:
IndexError: list index out of range

{"images":[{"id":1133,"dataset_id":9,"category_ids":[10],"path":"/home/kapao/data/dataset/plane/images/validation2017/hpron190000.jpg","width":1920,"height":1080,"file_name":"1133.jpg","annotated":true,"annotating":[],"num_annotations":1,"metadata":{},"milliseconds":0,"events":[],"regenerate_thumbnail":false,"is_modified":false},{"id":1134,"dataset_id":9,"category_ids":[10],"path":"/home/kapao/data/dataset/plane/images/validation2017/hpron190700.jpg","width":1920,"height":1080,"file_name":"1134.jpg","annotated":true,"annotating":[],"num_annotations":1,"metadata":{},"milliseconds":0,"events":[],"regenerate_thumbnail":false,"is_modified":false}],"categories":[{"id":10,"name":"airplane","supercategory":"airplane","color":"#bda513","metadata":{},"creator":"burak.m","keypoint_colors":["#bf5c4d","#d99100","#4d8068","#0d2b80"],"keypoints":["nose","left_wing","tail","right_wing"],"skeleton":[[1,4],[1,2],[2,3],[3,4]]}],"annotations":[{"id":190,"image_id":1133,"category_id":10,"dataset_id":9,"segmentation":[[1349.9,387.5,1349.9,461,1065.7,461,1065.7,387.5]],"area":20732,"bbox":[1066,388,284,73],"iscrowd":false,"isbbox":true,"creator":"burak.m","width":1920,"height":1080,"color":"#a60d4d","keypoints":[1091,437,2,0,0,0,1303,401,2,0,0,0],"metadata":{},"milliseconds":9684,"events":[{"_cls":"SessionEvent","created_at":{"$date":1644910240771},"user":"burak.m","milliseconds":9684,"tools_used":["BBox","Keypoints"]}],"num_keypoints":2},{"id":191,"image_id":1134,"category_id":10,"dataset_id":9,"segmentation":[[601.4,398.6,601.4,725.7,12.3,725.7,12.3,398.6]],"area":192603,"bbox":[12,399,589,327],"iscrowd":false,"isbbox":true,"creator":"burak.m","width":1920,"height":1080,"color":"#66c2f4","keypoints":[431,687,2,576,559,2,270,496,2,35,605,2],"metadata":{},"milliseconds":13192,"events":[{"_cls":"SessionEvent","created_at":{"$date":1644910251688},"user":"burak.m","milliseconds":13192,"tools_used":["BBox","Keypoints","Select"]}],"num_keypoints":4}]}

What is cause of errors? Is Coco-Annotator?

wmcnally commented 2 years ago

Your environment is using crowdposetools and I’m not sure why. It should be using pycocotools. Please ensure that COCOEval is being imported from pycocotools, not crowdposetools.

madenburak commented 2 years ago

I deleted my environment on conda. I created again new environment. Training is working now but metrics aren't working. How I save result of inference for video or image?

I use this command for video:

python demos/video.py -p /home/video/1-short.mp4 --weights /home/kapao/runs/train/exp6/weights/best.pt --display

tensorboard

wmcnally commented 2 years ago

The precision and recall metrics are not computed for human pose estimation so you can ignore them. If you're still only using 3 training images, I'm not surprised that your mAP is 0. To check if you can at least overfit your training data, you should use the training dataset for validation as well.

To save the inference video remove the --display argument and optionally add the --gif argument if you want to save as a gif instead of mp4.

madenburak commented 2 years ago

I see. I increased number of images in my dataset. But I getting this error now! I getting error at different percentage for different video. Also I getting error on inference of image.

Using device: cuda:0
Outpath: /home/kapao/runs/train/exp14/weights/tek_start
Running inference:  33%|██████████████████████████████████████▌                                                                            | 626/1869 [00:07<00:14, 83.45it/s]
Traceback (most recent call last):
  File "demos/video.py", line 184, in <module>
    for seg in data['segments'].values():
AttributeError: 'NoneType' object has no attribute 'values'

Using device: cuda:0
Outpath: /home/kapao/runs/train/exp14/weights/1-short
Running inference:  26%|█████████████████████████████▍                                                                                     | 763/2975 [00:14<00:42, 52.04it/s]
Traceback (most recent call last):
  File "demos/video.py", line 184, in <module>
    for seg in data['segments'].values():
AttributeError: 'NoneType' object has no attribute 'values'

(kapao) user@user:~/home/kapao$ python demos/image.py -p /home/kapao/data/datasets/bird/images/train2017/bird800.jpg --weights /home/kapao/runs/train/exp14/weights/best.pt
Using device: cuda:0
image 1/1 /home/kapao/data/datasets/bird/images/train2017/bird800.jpg: Traceback (most recent call last):
  File "demos/image.py", line 81, in <module>
    person_dets, kp_dets = run_nms(data, out)
  File "/home/kapao/val.py", line 36, in run_nms
    num_coords=data['num_coords'])
  File "/home/kapao/utils/general.py", line 693, in non_max_suppression_kp
    conf, j = x[:, 5:-num_coords].max(1, keepdim=True)
IndexError: max(): Expected reduction dim 1 to have non-zero size.

wmcnally commented 2 years ago

You have to pass the config file with the --data argument. Please try to read the code and diagnose the problem before posting in an issue. Thanks!

madenburak commented 2 years ago

Sorry! But it is not worked to me.

(kapao) user@user:~/home/kapao$ python demos/video.py -p /home/video/1-short.mp4 --weights /home/kapao/runs/train/exp15/weights/best.pt --data data/own.yaml
Using device: cuda:0
Running inference:  26%|█████████████████████████████████▌                                                                                                | 769/2975 [00:15<00:43, 51.12it/s]
Traceback (most recent call last):
  File "demos/video.py", line 185, in <module>
    for seg in data['segments'].values():
AttributeError: 'NoneType' object has no attribute 'values'

wmcnally commented 2 years ago

Do you have segments defined in your config file? Again, please try to debug before posting. Thanks!

madenburak commented 2 years ago

Thank you so much, really. I put too much indentation in the segment part in config file. I correct it then it works.

omiderfanmanesh commented 2 years ago

Hi I am trying to run this code with my own dataset. My dataset was built with the COCO annotator and it has a coco JSON format file. I have one class "gate" and 4 keypoints. this is my config file:


# train and val data as 1) directory: path/images/, 2) file: path/images.txt, or 3) list: [path1/images/, path2/images/]
path: /home/omid/OMID/projects/python/draft/kapao/data/datasets/front_background1/coco
labels: kp_labels
train: kp_labels/img_txt/train2017.txt
val: kp_labels/img_txt/val2017.txt
test: kp_labels/img_txt/test2017.txt

train_annotations: /home/omid/OMID/projects/python/draft/kapao/utils/Dataset-6.json
val_annotations: /home/omid/OMID/projects/python/draft/kapao/utils/Dataset-6.json
test_annotations: /home/omid/OMID/projects/python/draft/kapao/utils/Dataset-6.json

pose_obj: True  # write pose object labels

nc: 5  # number of classes (person class + 17 keypoint classes)
num_coords: 8  # number of keypoint coordinates (x, y)

# class names
names: [ "gate","top-left", "top-right", "bottom-left", "bottom-right" ]

kp_bbox: 0.05  # keypoint object size (normalized by longest img dim)
kp_flip: []  # for left-right keypoint flipping
kp_left: []  # left keypoints
kp_face: []

kp_names_short:
  0: 'tl'
  1: 'tr'
  2: 'bl'
  3: 'br'

# segments for plotting
segments:
    1: [1, 2]
    2: [1, 3]
    3: [2, 4]
    4: [3, 4]

I create label txt files with utils/labels.py and an example of the file is :


0 0.518229 0.469444 0.555208 0.744444 0.289583 0.148611 2.000000 0.757292 0.181944 2.000000 0.261458 0.770833 2.000000 0.723958 0.806944 2.000000
1 0.289583 0.148611 0.050000 0.066667 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
2 0.757292 0.181944 0.050000 0.066667 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
3 0.261458 0.770833 0.050000 0.066667 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
4 0.723958 0.806944 0.050000 0.066667 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000

The problem is when I want to run the code with these configs and labels, I see this error:


train.py --img 1280 --batch 8 --epochs 500 --data data/coco-kp-draft.yaml --hyp data/hyps/hyp.kp-p6.yaml --val-scales 1 --val-flips -1 --weights yolov5s6.pt --project runs/s_e500 --name train
train: weights=yolov5s6.pt, cfg=, data=data/coco-kp-draft.yaml, hyp=data/hyps/hyp.kp-p6.yaml, epochs=500, batch_size=8, imgsz=1280, rect=False, resume=False, nosave=False, noval=False, noautoanchor=False, evolve=None, bucket=, cache=None, image_weights=False, device=, multi_scale=False, single_cls=False, adam=False, sync_bn=False, workers=8, project=runs/s_e500, entity=None, name=train, exist_ok=False, quad=False, linear_lr=False, label_smoothing=0.0, upload_dataset=False, bbox_interval=-1, save_period=-1, artifact_alias=latest, local_rank=-1, freeze=0, patience=100, val_scales=[1.0], val_flips=[-1], autobalance=False
YOLOv5 🚀 ad507c2 torch 1.9.1+cu102 CUDA:0 (NVIDIA GeForce RTX 2070, 7982.3125MB)

hyperparameters: lr0=0.01, lrf=0.2, momentum=0.937, weight_decay=0.0005, warmup_epochs=3.0, warmup_momentum=0.8, warmup_bias_lr=0.1, box=0.05, cls=0.3, cls_pw=1.0, obj=0.7, obj_pw=1.0, kp=0.025, iou_t=0.2, anchor_t=4.0, fl_gamma=0.0, hsv_h=0.015, hsv_s=0.7, hsv_v=0.4, degrees=0.0, translate=0.1, scale=0.9, shear=0.0, perspective=0.0, flipud=0.0, fliplr=0.5, mosaic=1.0, mixup=0.0, copy_paste=0.0, kp_bbox=0.05
TensorBoard: Start with 'tensorboard --logdir runs/s_e500', view at http://localhost:6006/
wandb: Currently logged in as: omiderfanmanesh-altran (use `wandb login --relogin` to force relogin)
wandb: wandb version 0.12.17 is available!  To upgrade, please run:
wandb:  $ pip install wandb --upgrade

wandb: Tracking run with wandb version 0.12.6
wandb: Syncing run train
wandb:  View project at https://wandb.ai/omiderfanmanesh-altran/s_e500
wandb:  View run at https://wandb.ai/omiderfanmanesh-altran/s_e500/runs/2ojfm3oe
wandb: Run data is saved locally in /home/omid/OMID/projects/python/draft/kapao/wandb/run-20220526_154421-2ojfm3oe
wandb: Run `wandb offline` to turn off syncing.
Overriding model.yaml nc=80 with nc=13

autoanchor: Analyzing anchors... anchors/target = 6.07, Best Possible Recall (BPR) = 1.0000
Image sizes 1280 train, 1280 val
Using 8 dataloader workers
Logging results to runs/s_e500/train29
Starting training for 500 epochs...

     Epoch   gpu_mem       box       obj       cls       kps    labels  img_size
  0% 0/21 [00:00<?, ?it/s]
Traceback (most recent call last):
  File "/home/omid/OMID/projects/python/draft/kapao/train.py", line 601, in <module>
    main(opt)
  File "/home/omid/OMID/projects/python/draft/kapao/train.py", line 499, in main
    train(opt.hyp, opt, device)
  File "/home/omid/OMID/projects/python/draft/kapao/train.py", line 289, in train
    for i, (imgs, targets, paths, _) in pbar:  # batch -------------------------------------------------------------
  File "/home/omid/anaconda3/envs/pytorch/lib/python3.8/site-packages/tqdm/std.py", line 1195, in __iter__
    for obj in iterable:
  File "/home/omid/OMID/projects/python/draft/kapao/utils/datasets.py", line 148, in __iter__
    yield next(self.iterator)
  File "/home/omid/anaconda3/envs/pytorch/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 521, in __next__
    data = self._next_data()
  File "/home/omid/anaconda3/envs/pytorch/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 1203, in _next_data
    return self._process_data(data)
  File "/home/omid/anaconda3/envs/pytorch/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 1229, in _process_data
    data.reraise()
  File "/home/omid/anaconda3/envs/pytorch/lib/python3.8/site-packages/torch/_utils.py", line 425, in reraise
    raise self.exc_type(msg)
RuntimeError: Caught RuntimeError in DataLoader worker process 0.
Original Traceback (most recent call last):
  File "/home/omid/anaconda3/envs/pytorch/lib/python3.8/site-packages/torch/utils/data/_utils/worker.py", line 287, in _worker_loop
    data = fetcher.fetch(index)
  File "/home/omid/anaconda3/envs/pytorch/lib/python3.8/site-packages/torch/utils/data/_utils/fetch.py", line 47, in fetch
    return self.collate_fn(data)
  File "/home/omid/OMID/projects/python/draft/kapao/utils/datasets.py", line 665, in collate_fn
    return torch.stack(img, 0), torch.cat(label, 0), path, shapes
RuntimeError: torch.cat(): Sizes of tensors must match except in dimension 0. Got 18 and 1 in dimension 1 (The offending index is 1)

I think there is something wrong with labels but I don't know what it is. Thank you for your help

514398473 commented 2 years ago

您好我正在尝试使用我自己的数据集运行此代码。我的数据集是用 COCO 注释器构建的，它有一个 coco JSON 格式的文件。我有一类“门”和 4 个关键点。这是我的配置文件：

# train and val data as 1) directory: path/images/, 2) file: path/images.txt, or 3) list: [path1/images/, path2/images/]
path: /home/omid/OMID/projects/python/draft/kapao/data/datasets/front_background1/coco
labels: kp_labels
train: kp_labels/img_txt/train2017.txt
val: kp_labels/img_txt/val2017.txt
test: kp_labels/img_txt/test2017.txt

train_annotations: /home/omid/OMID/projects/python/draft/kapao/utils/Dataset-6.json
val_annotations: /home/omid/OMID/projects/python/draft/kapao/utils/Dataset-6.json
test_annotations: /home/omid/OMID/projects/python/draft/kapao/utils/Dataset-6.json

pose_obj: True  # write pose object labels

nc: 5  # number of classes (person class + 17 keypoint classes)
num_coords: 8  # number of keypoint coordinates (x, y)

# class names
names: [ "gate","top-left", "top-right", "bottom-left", "bottom-right" ]

kp_bbox: 0.05  # keypoint object size (normalized by longest img dim)
kp_flip: []  # for left-right keypoint flipping
kp_left: []  # left keypoints
kp_face: []

kp_names_short:
  0: 'tl'
  1: 'tr'
  2: 'bl'
  3: 'br'

# segments for plotting
segments:
    1: [1, 2]
    2: [1, 3]
    3: [2, 4]
    4: [3, 4]

我使用 utils/labels.py 创建标签 txt 文件，该文件的示例是：


0 0.518229 0.469444 0.555208 0.744444 0.289583 0.148611 2.000000 0.757292 0.181944 2.000000 0.261458 0.770833 2.000000 0.723958 0.806944 2.000000
1 0.289583 0.148611 0.050000 0.066667 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
2 0.757292 0.181944 0.050000 0.066667 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
3 0.261458 0.770833 0.050000 0.066667 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
4 0.723958 0.806944 0.050000 0.066667 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000

问题是当我想使用这些配置和标签运行代码时，我看到了这个错误：

train.py --img 1280 --batch 8 --epochs 500 --data data/coco-kp-draft.yaml --hyp data/hyps/hyp.kp-p6.yaml --val-scales 1 --val-flips -1 --weights yolov5s6.pt --project runs/s_e500 --name train
train: weights=yolov5s6.pt, cfg=, data=data/coco-kp-draft.yaml, hyp=data/hyps/hyp.kp-p6.yaml, epochs=500, batch_size=8, imgsz=1280, rect=False, resume=False, nosave=False, noval=False, noautoanchor=False, evolve=None, bucket=, cache=None, image_weights=False, device=, multi_scale=False, single_cls=False, adam=False, sync_bn=False, workers=8, project=runs/s_e500, entity=None, name=train, exist_ok=False, quad=False, linear_lr=False, label_smoothing=0.0, upload_dataset=False, bbox_interval=-1, save_period=-1, artifact_alias=latest, local_rank=-1, freeze=0, patience=100, val_scales=[1.0], val_flips=[-1], autobalance=False
YOLOv5 🚀 ad507c2 torch 1.9.1+cu102 CUDA:0 (NVIDIA GeForce RTX 2070, 7982.3125MB)

hyperparameters: lr0=0.01, lrf=0.2, momentum=0.937, weight_decay=0.0005, warmup_epochs=3.0, warmup_momentum=0.8, warmup_bias_lr=0.1, box=0.05, cls=0.3, cls_pw=1.0, obj=0.7, obj_pw=1.0, kp=0.025, iou_t=0.2, anchor_t=4.0, fl_gamma=0.0, hsv_h=0.015, hsv_s=0.7, hsv_v=0.4, degrees=0.0, translate=0.1, scale=0.9, shear=0.0, perspective=0.0, flipud=0.0, fliplr=0.5, mosaic=1.0, mixup=0.0, copy_paste=0.0, kp_bbox=0.05
TensorBoard: Start with 'tensorboard --logdir runs/s_e500', view at http://localhost:6006/
wandb: Currently logged in as: omiderfanmanesh-altran (use `wandb login --relogin` to force relogin)
wandb: wandb version 0.12.17 is available!  To upgrade, please run:
wandb:  $ pip install wandb --upgrade

wandb: Tracking run with wandb version 0.12.6
wandb: Syncing run train
wandb:  View project at https://wandb.ai/omiderfanmanesh-altran/s_e500
wandb:  View run at https://wandb.ai/omiderfanmanesh-altran/s_e500/runs/2ojfm3oe
wandb: Run data is saved locally in /home/omid/OMID/projects/python/draft/kapao/wandb/run-20220526_154421-2ojfm3oe
wandb: Run `wandb offline` to turn off syncing.
Overriding model.yaml nc=80 with nc=13

autoanchor: Analyzing anchors... anchors/target = 6.07, Best Possible Recall (BPR) = 1.0000
Image sizes 1280 train, 1280 val
Using 8 dataloader workers
Logging results to runs/s_e500/train29
Starting training for 500 epochs...

     Epoch   gpu_mem       box       obj       cls       kps    labels  img_size
  0% 0/21 [00:00<?, ?it/s]
Traceback (most recent call last):
  File "/home/omid/OMID/projects/python/draft/kapao/train.py", line 601, in <module>
    main(opt)
  File "/home/omid/OMID/projects/python/draft/kapao/train.py", line 499, in main
    train(opt.hyp, opt, device)
  File "/home/omid/OMID/projects/python/draft/kapao/train.py", line 289, in train
    for i, (imgs, targets, paths, _) in pbar:  # batch -------------------------------------------------------------
  File "/home/omid/anaconda3/envs/pytorch/lib/python3.8/site-packages/tqdm/std.py", line 1195, in __iter__
    for obj in iterable:
  File "/home/omid/OMID/projects/python/draft/kapao/utils/datasets.py", line 148, in __iter__
    yield next(self.iterator)
  File "/home/omid/anaconda3/envs/pytorch/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 521, in __next__
    data = self._next_data()
  File "/home/omid/anaconda3/envs/pytorch/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 1203, in _next_data
    return self._process_data(data)
  File "/home/omid/anaconda3/envs/pytorch/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 1229, in _process_data
    data.reraise()
  File "/home/omid/anaconda3/envs/pytorch/lib/python3.8/site-packages/torch/_utils.py", line 425, in reraise
    raise self.exc_type(msg)
RuntimeError: Caught RuntimeError in DataLoader worker process 0.
Original Traceback (most recent call last):
  File "/home/omid/anaconda3/envs/pytorch/lib/python3.8/site-packages/torch/utils/data/_utils/worker.py", line 287, in _worker_loop
    data = fetcher.fetch(index)
  File "/home/omid/anaconda3/envs/pytorch/lib/python3.8/site-packages/torch/utils/data/_utils/fetch.py", line 47, in fetch
    return self.collate_fn(data)
  File "/home/omid/OMID/projects/python/draft/kapao/utils/datasets.py", line 665, in collate_fn
    return torch.stack(img, 0), torch.cat(label, 0), path, shapes
RuntimeError: torch.cat(): Sizes of tensors must match except in dimension 0. Got 18 and 1 in dimension 1 (The offending index is 1)

我认为标签有问题，但我不知道它是什么。谢谢您的帮助 torch 1.10.1 ------> torch 1.9.1 I demoted pytorch and solved this problem.

omiderfanmanesh commented 2 years ago

torch 1.10.1 ------> torch 1.9.1 I demoted pytorch and solved this problem.

Thank you for your comment my PyTorch version is 1.9.1+cu102

gezhaoDL commented 1 year ago

In my work, the reason was load_mosaic，i added two lines in utils/datasets.py as below: 1667899688347