yingkunwu / R-YOLOv4

This is a PyTorch-based R-YOLOv4 implementation which combines YOLOv4 model and loss function from R3Det for arbitrary oriented object detection.
114 stars 20 forks source link

Training while labeling with label-studio #29

Closed Levaru closed 1 year ago

Levaru commented 2 years ago

Hi! I'm trying to implement your project as a ML backend for label-studio and I'm having some trouble. Predicting labels works without any problems and even training will work the first time. But when I try to train a second time I'll get the following error:

RuntimeError: CUDA out of memory. Tried to allocate 20.00 MiB (GPU 0; 7.79 GiB total capacity; 2.62 GiB already allocated; 37.62 MiB free; 2.72 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

This is my implentation of the ML backend:

import os, sys
currentdir = os.path.dirname(os.path.realpath(__file__))
parentdir = os.path.dirname(currentdir)
sys.path.append(parentdir)

from label_studio_ml.model import LabelStudioMLBase
from label_studio_ml.utils import get_image_size, get_single_tag_keys, is_skipped
from label_studio.core.utils.io import json_load, get_data_dir 
from label_studio.core.settings.base import DATA_UNDEFINED_NAME

import time
import random
import numpy as np
import torch
import shutil
import json
from terminaltables import AsciiTable
import glob

from model.yolo import Yolo
from lib.utils import load_class_names
from lib.scheduler import CosineAnnealingWarmupRestarts
from lib.post_process import post_process
from lib.logger import *
from lib.options import LabelStudioOptions
from lib.plot import rescale_boxes
import label_studio_sdk
from datasets.label_studio_dataset import ImageDataset, LabelStudioDataset, get_transformed_image
import cv2 as cv

from urllib.parse import urlparse

from PIL import Image

print("LabelStudioSdk Version: ", label_studio_sdk.__version__)

LABEL_STUDIO_HOST = os.getenv('LABEL_STUDIO_HOST', 'http://localhost:8080')
LABEL_STUDIO_API_KEY = os.getenv('LABEL_STUDIO_API_KEY', '4c23feec13e2118e053b9a9940f73ed96c0e0841')

device = torch.device('cuda:0' if torch.cuda.is_available() else 'cpu')

def weights_init_normal(m):
    if isinstance(m, torch.nn.Conv2d):
        torch.nn.init.normal_(m.weight.data, 0.0, 0.02)
    elif isinstance(m, torch.nn.BatchNorm2d):
        torch.nn.init.normal_(m.weight.data, 1.0, 0.02)
        torch.nn.init.constant_(m.bias.data, 0.0)

def init():
    random.seed(42)
    np.random.seed(42)
    torch.manual_seed(42)
    torch.backends.cudnn.deterministic = True
    torch.backends.cudnn.benchmark = False

class RotBBoxModel(object):
    def __init__(self, num_classes, args):
        self.args = args

        self.model = Yolo(n_classes=num_classes)
        self.model = self.model.to(device)

        self.logger = None
        self.model_path = None

        self.optimizer = torch.optim.Adam(self.model.parameters(), lr=self.args.lr)

    def log(self, total_loss, num_epochs, epoch, global_step, total_step, start_time):
        log = "\n---- [Epoch %d/%d] ----\n" % (epoch + 1, num_epochs)

        tensorboard_log = {}
        loss_table_name = ["Step: %d/%d" % (global_step, total_step),
                            "loss", "reg_loss", "conf_loss", "cls_loss"]
        loss_table = [loss_table_name]

        temp = ["YoloLayer1"]
        for name, metric in self.model.yolo1.metrics.items():
            if name in loss_table_name:
                temp.append(metric)
            tensorboard_log[f"{name}_1"] = metric
        loss_table.append(temp)

        temp = ["YoloLayer2"]
        for name, metric in self.model.yolo2.metrics.items():
            if name in loss_table_name:
                temp.append(metric)
            tensorboard_log[f"{name}_2"] = metric
        loss_table.append(temp)

        temp = ["YoloLayer3"]
        for name, metric in self.model.yolo3.metrics.items():
            if name in loss_table_name:
                temp.append(metric)
            tensorboard_log[f"{name}_3"] = metric
        loss_table.append(temp)

        tensorboard_log["total_loss"] = total_loss
        self.logger.list_of_scalars_summary(tensorboard_log, global_step)

        log += AsciiTable(loss_table).table
        log += "\nTotal Loss: %f, Runtime: %f\n" % (total_loss, time.time() - start_time)
        print(log)

    def save(self, path):
        print("Model saved in: ", path)
        torch.save(self.model.state_dict(), path)

    def load(self, path, train=False):
        print("Loading model...")
        if not train:
            print("Loading model for prediction...")
            self.model_path = path
            if os.path.exists(self.model_path):
                weight_path = glob.glob(os.path.join(self.model_path, "*.pth"))
                if len(weight_path) == 0:
                    assert False, "Model weight not found"
                elif len(weight_path) > 1:
                    assert False, "Multiple weights are found. Please keep only one weight in your model directory"
                else:
                    weight_path = weight_path[0]
            else:
                assert False, "Model is not exist"
            pretrained_dict = torch.load(weight_path, map_location=device)
            self.model.load_state_dict(pretrained_dict)
            self.model.eval()
        else:
            print("Loading model for training...")
            # if os.path.exists(path):
            #     weight_path = glob.glob(os.path.join(path, "*.pth"))[0]
            #     print("weight_path: ", weight_path)
            # else:
            #     print("Path does not exist")
            weight_path = "weights/pretrained/yolov4.pth"
            pretrained_dict = torch.load(weight_path, map_location=device)
            model_dict = self.model.state_dict()

            # 1. filter out unnecessary keys
            # pretrained_dict = {k: v for k, v in pretrained_dict.items() if np.shape(model_dict[k]) == np.shape(v)}
            pretrained_dict = {k: v for i, (k, v) in enumerate(pretrained_dict.items()) if i < 552}
            # 2. overwrite entries in the existing state dict
            model_dict.update(pretrained_dict)
            # 3. load the new state dict
            self.model.apply(weights_init_normal)
            self.model.load_state_dict(model_dict)
            self.model.eval()

    def predict(self, image_urls):
        images = torch.stack([get_transformed_image(url, self.args.img_size) for url in image_urls]).to(device)

        with torch.no_grad():
            temp = time.time()
            output, _ = self.model(images)  # batch=1 -> [1, n, n], batch=3 -> [3, n, n]
            temp1 = time.time()
            boxes = post_process(output, self.args.conf_thres, self.args.nms_thres)
            temp2 = time.time()
            print('-----------------------------------')
            num = 0
            for b in boxes:
                if b is None:
                    break
                num += len(b)
            print("{} objects found".format(num))
            print("Inference time : ", round(temp1 - temp, 5))
            print("Post-processing time : ", round(temp2 - temp1, 5))
            print('-----------------------------------')
            return boxes

    def train(self, dataloader, num_epochs=5):
        init()
        if(self.model_path == None):
            self.model_path = os.path.join("weights", self.args.model_name)
        self.logger = Logger(os.path.join(self.model_path, "logs"))

        num_iters_per_epoch = len(dataloader)
        scheduler_iters = round(num_epochs * len(dataloader) / self.args.subdivisions)
        total_step = num_iters_per_epoch * num_epochs

        scheduler = CosineAnnealingWarmupRestarts(self.optimizer,
                                                first_cycle_steps=round(scheduler_iters),
                                                max_lr=self.args.lr,
                                                min_lr=1e-5,
                                                warmup_steps=round(scheduler_iters * 0.1),
                                                cycle_mult=1,
                                                gamma=1)

        start_time = time.time()
        self.model.train()
        for epoch in range(num_epochs):
            print('Epoch {}/{}'.format(epoch, num_epochs - 1))
            print('-' * 10)

            for batch, (_, imgs, targets) in enumerate(dataloader):
                global_step = num_iters_per_epoch * epoch + batch + 1
                imgs = imgs.to(device)
                targets = targets.to(device)

                outputs, loss = self.model(imgs, targets)

                loss.backward()
                total_loss = loss.detach().item()

                if global_step % self.args.subdivisions == 0:
                    self.optimizer.step()
                    self.optimizer.zero_grad()
                    scheduler.step()

                self.log(total_loss, num_epochs, epoch, global_step, total_step, start_time)

        print()

        time_elapsed = time.time() - start_time
        print('Training complete in {:.0f}m {:.0f}s'.format(time_elapsed // 60, time_elapsed % 60))

        return self.model

class RotBBoxModelApi(LabelStudioMLBase):

    def __init__(self, **kwargs):
        # don't forget to initialize base class...
        super(RotBBoxModelApi, self).__init__(**kwargs)

        parser = LabelStudioOptions()
        self.args = parser.parse()

        self.from_name, self.to_name, self.value, self.labels_in_config = get_single_tag_keys(
            self.parsed_label_config, 'RectangleLabels', 'Image'
        )

        print("from_name: ", self.from_name)
        print("to_name: ", self.to_name)
        print("value: ", self.value)
        print("labels_in_config: ", self.labels_in_config)
        print("parsed_label_config: ", self.parsed_label_config)
        print("train_output: ", self.train_output)

        # self.model = RotBBoxModel(len(self.labels_in_config), self.args)
        # self.model_path = os.path.join("weights", self.args.model_name)
        # print(self.model_path)
        # self.model.load(self.model_path)

        if self.train_output:
            self.model = RotBBoxModel(len(self.labels_in_config), self.args)
            self.model.load(self.train_output['model_path'], self.train_output)
        else:
            self.model = RotBBoxModel(len(self.labels_in_config), self.args)
            model_path = os.path.join("weights", self.args.model_name)
            print(model_path)
            self.model.load(model_path)

    def reset_model(self):
        self.model = RotBBoxModel(len(self.labels_in_config), self.args)
        self.model_path = os.path.join("weights", self.args.model_name)
        self.model.load(self.model_path)

    def predict(self, tasks, **kwargs):
        """ This is where inference happens:
            model returns the list of predictions based on input list of tasks

            :param tasks: Label Studio tasks in JSON format
        """
        image_urls = [task['data'][self.value] for task in tasks]
        print(image_urls)
        model_results = self.model.predict(image_urls)
        results = []
        all_scores = []
        avg_score = 0

        for i, (url, box) in enumerate(zip(image_urls, model_results)):
            if box is not None:
                image_path = self.get_local_path(url)
                # image_shape = get_image_shape(url)
                img_width, img_height = get_image_size(image_path)
                boxes = rescale_boxes(box, self.args.img_size, (img_height, img_width))
                boxes = np.array(boxes)

                for i in range(len(boxes)):
                    bbox = boxes[i]
                    center_x, center_y, w, h, theta = bbox[0], bbox[1], bbox[2], bbox[3], bbox[4]
                    score = round(bbox[5] * bbox[6], 2)
                    cls_id = np.squeeze(int(bbox[7]))

                    # Calculate top left corner of rotated bbox (box center is origin)
                    left_local = -w/2
                    top_local = -h/2
                    rotated_left_local = np.cos(theta) * left_local - np.sin(theta) * top_local
                    rotated_top_local = np.sin(theta) * left_local + np.cos(theta) * top_local
                    rotated_left = center_x + rotated_left_local
                    rotated_top = center_y + rotated_top_local

                    x_percent = ( (rotated_left / img_width) * 100.0).item()
                    y_percent = ( (rotated_top / img_height) * 100.0).item()
                    w_percent = ( (w / img_width) * 100.0).item()
                    h_percent = ( (h / img_height) * 100.0).item()

                    results.append({
                        'from_name': self.from_name,
                        'to_name': self.to_name,
                        'type': 'rectanglelabels',
                        'value': {
                            'rectanglelabels': [self.labels_in_config[cls_id]],
                            'x': x_percent,
                            'y': y_percent,
                            'width': w_percent,
                            'height': h_percent,
                            'rotation': np.rad2deg(theta).item()
                        },
                        'score': score.item()
                    })
                    all_scores.append(score)
                avg_score = sum(all_scores) / max(len(all_scores), 1)
        if(avg_score != 0):
            avg_score = avg_score.item()
        return [{
            'result': results,
            'score': avg_score
        }]

    def download_tasks(self, project):
        """
        Download all labeled tasks from project using the Label Studio SDK.
        Read more about SDK here https://labelstud.io/sdk/
        :param project: project ID
        :return:
        """
        ls = label_studio_sdk.Client(LABEL_STUDIO_HOST, LABEL_STUDIO_API_KEY)
        project = ls.get_project(id=project)
        tasks = project.get_labeled_tasks()
        return tasks

    def fit(self, tasks, workdir=None, batch_size=32, num_epochs=10, **kwargs):
        """
        This method is called each time an annotation is created or updated
        :param kwargs: contains "data" and "event" key, that could be used to retrieve project ID and annotation event type
                        (read more in https://labelstud.io/guide/webhook_reference.html#Annotation-Created)
        :return: dictionary with trained model artefacts that could be used further in code with self.train_output
        """
        if 'data' not in kwargs:
            raise KeyError(f'Project is not identified. Go to Project Settings -> Webhooks, and ensure you have "Send Payload" enabled')

        data = kwargs['data']
        project = data['project']['id']
        tasks = self.download_tasks(project)
        if len(tasks) > 0:
            print(f'{len(tasks)} labeled tasks downloaded for project {project}')

            image_urls, image_labels = [], []
            print('Collecting annotations...')
            for task in tasks:

                if is_skipped(task):
                    continue

                filepath = self.get_local_path(task['data'][self.value])
                image_urls.append(filepath)
                image_labels.append(task['annotations'][0]['result'])

            # augment = False if self.args.no_augmentation else True
            # mosaic = False if self.args.no_mosaic else True
            # multiscale = False if self.args.no_multiscale else True

            augment = False
            mosaic = False
            multiscale = False

            print(f'Creating dataset with {len(image_urls)} images...')
            dataset = LabelStudioDataset(image_urls, image_labels, self.labels_in_config, 
                                         self.args.img_size, self.args.sample_size,
                                         augment=augment, mosaic=mosaic, multiscale=multiscale)
            dataloader = torch.utils.data.DataLoader(dataset, batch_size=batch_size, shuffle=True, pin_memory=True, collate_fn=dataset.collate_fn)

            print('Train model...')
            self.reset_model()
            self.model.train(dataloader, num_epochs=num_epochs)

            print('Save model...')
            # model_path = os.path.join(workdir, 'model.pt')
            model_path = os.path.join(self.model_path, "ryolov4.pth")
            self.model.save(model_path)

            return {
                'model_path': model_path, 
                'labels': image_labels
            }

        else:
            print('No labeled tasks found: make some annotations...')
            return {}

This is basically just your code combined from detect.py and train.py.

The testing is performed with the trash dataset and a model that was also trained on it. I'm not really familiar with pytorch and don't know if I implemented it correctly for this kind of application. I guess that the out-of-memory error is caused by reloading the model without clearing some old variables first? I have no idea which though.

Could you please take a look at it if you have the time? Maybe I'm just loading the model the wrong way.

yingkunwu commented 2 years ago

Thank you for your feedback!

What did you do when you run your second training? If it's okay for the first training, it should be fine for the rest. Do you change your dataset or modify any hyperparameter?

Levaru commented 2 years ago

Kinda. The training process with labelstudio goes like this:

  1. You start annotating your first image. The model is loaded from either generic pretrained weights or an previous checkpoint.
  2. You finished annotating the first image and press the submit button to save them. At that moment the labelstudio-ml-backend is called and the def fit(...): function from class RotBBoxModelApi(LabelStudioMLBase) is executed.
  3. The model is reset (basically loaded again like in step 1 for some reason).
  4. Inside the fit(...) function all of the finished tasks are downloaded. In this case a task consists of an image and the annotations/labels for this image. These images and annotations are then combined into a dataset (with optional augmentations) and then used to train the model with a small number of epochs.
  5. Then you start annotating the second image and the whole process repeats. That means that the dataset grows with each new image, but it will only consist of the images you uploaded and finished annotating. In theory, while you annotate your images, the model is being trained and should provide better and better predictions thus making the whole annotating process much easier and faster.

I turned of most of the augmentations like mosaic and multiscale (especially multiscale because I get an OutOfMemory error even when training normally, I only have one 8Gb graphics card). This didn't really help with the labelstudio training.

I also did make sure that the labels from labelstudio are being correctly parsed to the format that R-Yolov4 requires.

yingkunwu commented 2 years ago

I wasn't sure I am right or not. I was guessing that the following description of the training process of LabelStudio means that it will have a child process running in the background; therefore causing the CUDA memory error. Did the author of LabelStudio suggest how much your graphics card should be?

These images and annotations are then combined into a dataset (with optional augmentations) and then used to train the model with a small number of epochs.