YOLO 학습을 위한 데이터 포맷 변환

jeongjae96 commented 1 year ago

Description

10 을 위한 데이터 포맷 변환 구현을 해야 합니다. 용량 낭비를 막기 위해 추가로 이미지 경로를 만드는 대신 txt만으로 구분할 수 있도록 구현 예정입니다.

Tasks

[x] YOLO input 포맷 파악
[x] 데이터 포맷 YOLO 형식으로 변환

jeongjae96 commented 1 year ago

Ultralytics YOLO Format

Label Format

One text file per image: Each image in the dataset has a corresponding text file with the same name as the image file and the .txt extension.
One row per object: Each row in the text file corresponds to one object instance in the image.
Object information per row: Each row contains the following information about the object instance:
- Object class index: An integer representing the class of the object (e.g., 0 for person, 1 for car, etc.).
- Object center coordinates: The x and y coordinates of the center of the object, normalized to be between 0 and 1.
- Object width and height: The width and height of the object, normalized to be between 0 and 1.

The format for a single row in the detection dataset file:

<object-class> <x_normalized btw 0 and 1> <y_normalized btw 0 and 1> <width_normalized btw 0 and 1> <height_normalized btw 0 and 1>

e.g.

0 0.5 0.4 0.3 0.6
1 0.3 0.7 0.4 0.2

Dataset File Format

Uses a YAML file format to define the dataset and model configuration for training.

example of the YAML format used for defining a detection dataset

train: <path-to-training-images>
val: <path-to-validation-images>

nc: <number-of-classes>
names: [<class-1>, <class-2>, ..., <class-n>] # The order of the names should match the order of the object class indices in the YOLO dataset files.

Either nc or names must be defined. Defining both are not mandatory.

Alternatively, it is possible to define class names like following:

names:
  0: person
  1: bicycle

e.g.

train: data/train/
val: data/val/

nc: 2
names: ['person', 'car']

jeongjae96 commented 1 year ago

JSON2YOLO를 통해 COCO format으로 변환 가능하지만, COCO dataset에 맞춰 category id를 1씩 빼준다. 우리 데이터셋은 category id가 0부터 시작해 유의해야 한다.

jeongjae96 commented 1 year ago

convert2Yolo도 YOLO format으로 변환 가능할 것으로 보인다.

jeongjae96 commented 1 year ago

coco.yaml

# Ultralytics YOLO 🚀, AGPL-3.0 license
# COCO 2017 dataset http://cocodataset.org by Microsoft
# Example usage: yolo train data=coco.yaml
# parent
# ├── ultralytics
# └── datasets
#     └── coco  ← downloads here (20.1 GB)

# Train/val/test sets as 1) dir: path/to/imgs, 2) file: path/to/imgs.txt, or 3) list: [path/to/imgs1, path/to/imgs2, ..]
path: ../datasets/coco  # dataset root dir
train: train2017.txt  # train images (relative to 'path') 118287 images
val: val2017.txt  # val images (relative to 'path') 5000 images
test: test-dev2017.txt  # 20288 of 40670 images, submit to https://competitions.codalab.org/competitions/20794

# Classes
names:
  0: person
  1: bicycle
  2: car
  3: motorcycle
  4: airplane
  5: bus
  6: train
  7: truck
  8: boat
  9: traffic light
  10: fire hydrant
  11: stop sign
  12: parking meter
  13: bench
  14: bird
  15: cat
  16: dog
  17: horse
  18: sheep
  19: cow
  20: elephant
  21: bear
  22: zebra
  23: giraffe
  24: backpack
  25: umbrella
  26: handbag
  27: tie
  28: suitcase
  29: frisbee
  30: skis
  31: snowboard
  32: sports ball
  33: kite
  34: baseball bat
  35: baseball glove
  36: skateboard
  37: surfboard
  38: tennis racket
  39: bottle
  40: wine glass
  41: cup
  42: fork
  43: knife
  44: spoon
  45: bowl
  46: banana
  47: apple
  48: sandwich
  49: orange
  50: broccoli
  51: carrot
  52: hot dog
  53: pizza
  54: donut
  55: cake
  56: chair
  57: couch
  58: potted plant
  59: bed
  60: dining table
  61: toilet
  62: tv
  63: laptop
  64: mouse
  65: remote
  66: keyboard
  67: cell phone
  68: microwave
  69: oven
  70: toaster
  71: sink
  72: refrigerator
  73: book
  74: clock
  75: vase
  76: scissors
  77: teddy bear
  78: hair drier
  79: toothbrush

# Download script/URL (optional)
download: |
  from ultralytics.yolo.utils.downloads import download
  from pathlib import Path

  # Download labels
  segments = True  # segment or box labels
  dir = Path(yaml['path'])  # dataset root dir
  url = 'https://github.com/ultralytics/yolov5/releases/download/v1.0/'
  urls = [url + ('coco2017labels-segments.zip' if segments else 'coco2017labels.zip')]  # labels
  download(urls, dir=dir.parent)
  # Download data
  urls = ['http://images.cocodataset.org/zips/train2017.zip',  # 19G, 118k images
          'http://images.cocodataset.org/zips/val2017.zip',  # 1G, 5k images
          'http://images.cocodataset.org/zips/test2017.zip']  # 7G, 41k images (optional)
  download(urls, dir=dir / 'images', threads=3)

jeongjae96 commented 1 year ago

YOLO 파일 구조

MrSteveChoi commented 1 year ago

확인했슴다! 참고해서 작업 진행하겠습니다.

jeongjae96 commented 1 year ago

yolov7에서는 data yaml에서 path를 인식하지 못해 에러가 발생했다. yaml 파일 수정 및 전체 train 데이터의 이미지 정보가 담긴 txt 파일과 그 정보를 담는 yaml 파일 생성이 필요하다. 추가로, validation set이 필수로 들어가야하므로 sample validation txt도 필요하다.

jeongjae96 / synthesis-car-od