open-mmlab / mmdetection

OpenMMLab Detection Toolbox and Benchmark
https://mmdetection.readthedocs.io
Apache License 2.0
29.14k stars 9.39k forks source link

Zero values for segmentation metrics (SEGM) calculation on COCO type custom dataset #3796

Closed ecm200 closed 3 years ago

ecm200 commented 4 years ago

Describe the bug

When using a custom dataset that trains adequately using Mask R-CNN architectures, evaluation of the test images results in 0.0 values for Segmentation Metrics (SEGM). I use a test script based on the example from the COCO documentation. There are no issues with the data training, with reasonable values for all losses. Also the COCO evaluation metrics report reasonable numbers for the BBOX metrics.

The training was performed using 4 Tesla P40 GPUs in parallel. The inference was undertaken on a single P40. The BBOX metrics of the final iteration for the evaluation are the same as those produced during the model training.

The resulting output from the evaluation call are shown below.

Evaluation Script Results

[INFO] Loading inference of test set from file...output/mixed_NANs/cuboidal_square_images_v1p2A_4XGPU_MaskRCNN_ResNet101_FPN_01092020_202857/test_set_inf_epoch_72_results.pkl

Evaluating bbox...
Loading and preparing results...
DONE (t=4.10s)
creating index...
index created!
Running per image evaluation...
Evaluate annotation type *bbox*
DONE (t=573.38s).
Accumulating evaluation results...
DONE (t=1.49s).
Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.186
Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.366
Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.165
Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.134
Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.361
Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.277
Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.003
Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.032
Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.222
Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.134
Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.427
Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.420

Evaluating segm...
Loading and preparing results...
DONE (t=3.64s)
creating index...
index created!
Running per image evaluation...
Evaluate annotation type *segm*
DONE (t=578.00s).
Accumulating evaluation results...
DONE (t=1.14s).
Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.000
Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.000
Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.000
Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.000
Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.000
Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.000
Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.000
Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.000
Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.000
Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.000
Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.000
Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.000

Steps taken so far

I have made many checks on the quality of my custom COCO dataset including making sure of the following:

  1. Bounding box and polygon vertices are all within the image frame.
  2. No zero size bounding boxes.
  3. No 0 pixel masks.

The custom COCO dataset JSON uses the single array for mask description [[x1,y1,x2,y2,...,xn,yn]].

The example output from running the evaluation script can be seen below.

The model was trained and evaluated with MMDET version V2.0.0 and MMCV v0.5.3.

Environment for training and evaluation

sys.platform: linux
Python: 3.7.6 | packaged by conda-forge | (default, Mar  5 2020, 15:27:18) [GCC 7.3.0]
CUDA available: True
CUDA_HOME: /usr/local/cuda
NVCC: Cuda compilation tools, release 10.1, V10.1.243
GPU 0,1,2,3: Tesla P40
GCC: gcc (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0
PyTorch: 1.4.0
PyTorch compiling details: PyTorch built with:
  - GCC 7.3
  - Intel(R) Math Kernel Library Version 2020.0.0 Product Build 20191122 for Intel(R) 64 architecture applications
  - Intel(R) MKL-DNN v0.21.1 (Git Hash 7d2fd500bc78936d1d648ca713b901012f470dbc)
  - OpenMP 201511 (a.k.a. OpenMP 4.5)
  - NNPACK is enabled
  - CUDA Runtime 10.1
  - NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_37,code=compute_37
  - CuDNN 7.6.3
  - Magma 2.5.1
  - Build settings: BLAS=MKL, BUILD_NAMEDTENSOR=OFF, BUILD_TYPE=Release, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -fopenmp -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -O2 -fPIC -Wno-narrowing -Wall -Wextra -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-variable -Wno-unused-function -Wno-unused-result -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Wno-stringop-overflow, DISABLE_NUMA=1, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, USE_CUDA=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON, USE_STATIC_DISPATCH=OFF,

TorchVision: 0.5.0
OpenCV: 4.2.0
MMCV: 0.5.1
MMDetection: 2.0.0+6603790
MMDetection Compiler: GCC 7.5
MMDetection CUDA Compiler: 10.1

I have since loaded the model weights into MMDET V2.3.0 and performed the same test, only to get the same results with BBOX metrics, and 0.0's for SEGM metrics.

Additional environment for testing

sys.platform: linux
Python: 3.7.8 | packaged by conda-forge | (default, Jul 31 2020, 02:25:08) [GCC 7.5.0]
CUDA available: True
CUDA_HOME: /usr/local/cuda
NVCC: Cuda compilation tools, release 10.1, V10.1.243
GPU 0,1,2,3: Tesla P40
GCC: gcc (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0
PyTorch: 1.5.1
PyTorch compiling details: PyTorch built with:
  - GCC 7.3
  - C++ Version: 201402
  - Intel(R) Math Kernel Library Version 2020.0.1 Product Build 20200208 for Intel(R) 64 architecture applications
  - Intel(R) MKL-DNN v0.21.1 (Git Hash 7d2fd500bc78936d1d648ca713b901012f470dbc)
  - OpenMP 201511 (a.k.a. OpenMP 4.5)
  - NNPACK is enabled
  - CPU capability usage: AVX2
  - CUDA Runtime 10.1
  - NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_37,code=compute_37
  - CuDNN 7.6.3
  - Magma 2.5.2
  - Build settings: BLAS=MKL, BUILD_TYPE=Release, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -fopenmp -DNDEBUG -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DUSE_INTERNAL_THREADPOOL_IMPL -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-variable -Wno-unused-function -Wno-unused-result -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, USE_CUDA=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON, USE_STATIC_DISPATCH=OFF,

TorchVision: 0.6.0a0+35d732a
OpenCV: 4.4.0
MMCV: 1.0.5
MMDetection: 2.3.0+68d860d
MMDetection Compiler: GCC 7.5
MMDetection CUDA Compiler: 10.1

Reproduction

This is the evaluation script, which uses the evaluation method of the COCO dataset class to perform the metrics evaluation. The custom dataset class written for the custom dataset only merely changes the CLASSES class attributes to align with the custom dataset. All other methods are inherited unchanged from the CocoDataset class.

Evaluation Script with Test Set

import argparse
import os

import mmcv
import torch
from mmcv.parallel import MMDataParallel, MMDistributedDataParallel
from mmcv.runner import get_dist_info, init_dist, load_checkpoint

from mmdet.apis import multi_gpu_test, single_gpu_test
from mmdet.core import wrap_fp16_model
from mmdet.datasets import build_dataloader, build_dataset
from mmdet.models import build_detector

from mmdetection_morphologi_pipelines import LoadMorphologiSynImage
from mmdetection_morphologi_datasets import MorphologiDataset

BASE_DIR = ''
MODEL_BASE_DIR = 'output/mixed_NANs/cuboidal_square_images_v1p2A_4XGPU_MaskRCNN_ResNet101_FPN_01092020_202857' #cuboidal_square_images_v1p2_MaskRCNN_ResNeXt101_FPN_14082020_161334'
CONFIG = os.path.join(BASE_DIR,'configs_morph/mmdetection_morphologi_mask_rcnn_r101_fpn_1x.py')
CHECKPOINT = os.path.join(BASE_DIR,MODEL_BASE_DIR,'epoch_72.pth')
TMP_DIR = os.path.join(BASE_DIR,MODEL_BASE_DIR)
LAUNCHER = 'none' # choices ['none', 'pytorch', 'slurm', 'mpi']
GPU_COLLECT = False # Only matters in distributed mode (multiple GPUs)
N_GPUS = 1
DEVICE = 'cuda:0'
SHOW_RESULTS = False # True causes things to crash due to X server issues.
FORMAT_ONLY = False
EVAL = ['bbox','segm'] # Overide the evaluation metrics from the config file. List of options ["bbox", "segm", "proposal"], can be all or some or one.
OPTIONS = None
RUN_EVALS = False # (True) Run the inference with the test set or (False) load from file

OUT = os.path.join(MODEL_BASE_DIR,'test_set_inf_'+CHECKPOINT.split('/')[-1][:-4]+'_results.pkl')

cfg = mmcv.Config.fromfile(CONFIG)

if cfg.get('cudnn_benchmark', False):
        torch.backends.cudnn.benchmark = True
cfg.model.pretrained = None
cfg.data.test.test_mode = True

if EVAL is not None:
    print('[CFG] Over-riding the configuration file evaluation metrics: {}'.format(cfg.evaluation.metric))
    cfg.evaluation.metric = EVAL
    print('[CFG] New evaluation metric(s): {}'.format(cfg.evaluation.metric))

if cfg.gpus > N_GPUS:
    print('[CFG] Number of GPUs set higher than specified ({}). Setting to specified ({})'.format(cfg.gpus,N_GPUS))
    cfg.gpus = N_GPUS

# init distributed env first, since logger depends on the dist info.
if LAUNCHER == 'none':
    distributed = False
else:
    distributed = True
    init_dist(LAUNCHER, **cfg.dist_params)

# build the dataloader
# TODO: support multiple images per gpu (only minor changes are needed)
dataset = build_dataset(cfg.data.test)

data_loader = build_dataloader(
    dataset,
    samples_per_gpu=1,
    workers_per_gpu=cfg.data.workers_per_gpu,
    dist=distributed,
    shuffle=False)

# Build the model and run inference on the test set
if RUN_EVALS:
    # build the model and load checkpoint
    model = build_detector(cfg.model, train_cfg=None, test_cfg=cfg.test_cfg)
    fp16_cfg = cfg.get('fp16', None)
    if fp16_cfg is not None:
        wrap_fp16_model(model)
    checkpoint = load_checkpoint(model, CHECKPOINT, map_location=DEVICE)

    # old versions did not save class info in checkpoints, this walkaround is
    # for backward compatibility
    if 'CLASSES' in checkpoint['meta']:
        model.CLASSES = checkpoint['meta']['CLASSES']
    else:
        model.CLASSES = dataset.CLASSES

    if not distributed:
        model = MMDataParallel(model, device_ids=[0])
        outputs = single_gpu_test(model, data_loader, SHOW_RESULTS)
    else:
        model = MMDistributedDataParallel(
            model.cuda(),
            device_ids=[torch.cuda.current_device()],
            broadcast_buffers=False)
        outputs = multi_gpu_test(model, data_loader, TMP_DIR,
                                 GPU_COLLECT)

    rank, _ = get_dist_info()
    if rank == 0:
        if OUT:
            print('\nwriting results to {}'.format(OUT))
            mmcv.dump(outputs, OUT)
        kwargs = {} if OPTIONS is None else OPTIONS
        if FORMAT_ONLY:
            dataset.format_results(outputs, **kwargs)
        if EVAL:
            dataset.evaluate(outputs, EVAL, **kwargs)

# Load the previously inferred results from file
else:
    print('[INFO] Loading inference of test set from file...{}'.format(OUT))
    outputs = mmcv.load(file=OUT)
    kwargs = {} if OPTIONS is None else OPTIONS
    rank, _ = get_dist_info()
    if rank == 0:
        dataset.evaluate(outputs, EVAL, **kwargs)

Training Script

The model was trained using a custom training script based on the COCO example, and uses a custom modified COCO class dataset type.

import mmcv
from mmcv import Config, DictAction
from mmcv.runner import init_dist
import torch

from mmdet import __version__
#from mmdet.apis import set_random_seed, train_detector
from mmdet.datasets import build_dataset
from mmdet.models import build_detector
from mmdet.utils import collect_env, get_root_logger

# import __main__ as main
import os
import random
import datetime 
import shutil
import copy
import time
from glob import glob
#from sklearn.model_selection import train_test_split
import albumentations as A
import numpy as np
import argparse

from mmdetection_morphologi_pipelines import LoadMorphologiSynImage
from mmdetection_morphologi_datasets import MorphologiDataset
from mmdetection_morphologi_train import set_random_seed, train_detector
from mmdetection_morphologi_hooks import ValidationCheckpointHook, TensorboardLoggerWithImagesHook
from mmdetection_morphologi_utils import save_pickle

BASE_DIR = '' # 'advanced_seg/MMDetection_experiments' # Base DIR specification for executing in debug mode
TESTING = False

def parse_args():
    parser = argparse.ArgumentParser(description='Train a detector')
    parser.add_argument('--config', help='train config file path', default=os.path.join(BASE_DIR,'configs_morph/mmdetection_morphologi_mask_rcnn_x101_32x4d_fpn_1x.py')) # mmdetection_morphologi_mask_rcnn_x101_32x4d_fpn_1x.py'configs_morph/mmdetection_morphologi_mask_rcnn_r50_fpn_1x.py'
    parser.add_argument('--work_dir', help='the dir to save logs and models', default=os.path.join(BASE_DIR,'output/SGD'))
    parser.add_argument('--workflow', type=int, help='Workflow type [1] train only, [2] train and validate every epoch', default=2)
    parser.add_argument('--job_name', help='name for output files and dirs', default='cuboidal_square_images_v1p3_4XGPU_DODParsV1_') #'spherical_v1_cuboidal_square_images_v1p1_combined_500_125_4536_')
    parser.add_argument(
        '--resume-from', help='the checkpoint file to resume from')
    parser.add_argument(
        '--validate',
        action='store_true',
        help='whether to evaluate the checkpoint during training', default=True)
    group_gpus = parser.add_mutually_exclusive_group()
    group_gpus.add_argument(
        '--gpus',
        type=int,
        help='number of gpus to use '
        '(only applicable to non-distributed training)')
    group_gpus.add_argument(
        '--gpu-ids',
        type=int,
        nargs='+',
        help='ids of gpus to use '
        '(only applicable to non-distributed training)')
    parser.add_argument('--seed', type=int, default=42, help='random seed') # Default is usually 42
    parser.add_argument(
        '--deterministic',
        action='store_true',
        help='whether to set deterministic options for CUDNN backend.')
    parser.add_argument(
        '--options', nargs='+', action=DictAction, help='arguments in dict')
    parser.add_argument(
        '--launcher',
        choices=['none', 'pytorch', 'slurm', 'mpi'],
        default='none',
        help='job launcher')
    parser.add_argument('--local_rank', type=int, default=0)
    parser.add_argument(
        '--autoscale-lr',
        action='store_true',
        help='automatically scale lr with the number of gpus',
        default=True) #Added by ECM as this should [always be used
    args = parser.parse_args()
    if 'LOCAL_RANK' not in os.environ:
        os.environ['LOCAL_RANK'] = str(args.local_rank)

    return args

def main():

    args = parse_args()

    # Output dir and job details
    if args.job_name[-1] == '_':
        job_name_preamble = args.job_name
    else:
        job_name_preamble = args.job_name + '_'

    #### CONFIG
    ## Get the Base Config
    cfg = Config.fromfile(args.config)

    ## Set up Config

    # Get additional keyword arguments for configuration
    if args.options is not None:
        cfg.merge_from_dict(args.options)

    # set cudnn_benchmark
    if cfg.get('cudnn_benchmark', False):
        torch.backends.cudnn.benchmark = True

    # Setup output dir and copy over the exection script, save copy of config and input arguments
    if args.work_dir is not None:
        if TESTING:
            output_base_dir = os.path.join(args.work_dir,'testing')
        else:
            output_base_dir = args.work_dir
    elif cfg.get('work_dir', None) is None:
        if TESTING:
            output_base_dir = os.path.join(args.work_dir,'output/testing')
        else:
            output_base_dir = os.path.join(BASE_DIR,'output')
    cfg.work_dir = os.path.join(output_base_dir,job_name_preamble+cfg.model.type+'_'+cfg.model.backbone.type+str(cfg.model.backbone.depth)+'_'+cfg.model.neck.type+'_'+datetime.datetime.now().strftime('%d%m%Y_%H%M%S'))
    os.makedirs(cfg.work_dir, exist_ok=True)
    # Save this file
    shutil.copyfile(__file__, os.path.join(cfg.work_dir,__file__.split('/')[-1]))
    # Save the input arguments
    args_out_path = os.path.join(cfg.work_dir,'args_in_dict.pkl')
    save_pickle(pkl_object=args.__dict__, fname=args_out_path)
    # Save the config to file
    cfg_out_path = os.path.join(cfg.work_dir,cfg.work_dir.split('/')[-1]+'.cfg')
    cfg_out = open(cfg_out_path, 'w')
    cfg_out.writelines(cfg.text)
    cfg_out.close()

    # init the logger before other steps
    timestamp = time.strftime('%Y%m%d_%H%M%S', time.localtime())
    log_file = os.path.join(cfg.work_dir, '{}.log'.format(timestamp))
    logger = get_root_logger(log_file=log_file, log_level=cfg.log_level)
    logger.info('Created output directory: {}'.format(cfg.work_dir))

    logger.info('Command line arguments passed to training script: ')
    for arg_key,arg_value in args.__dict__.items():
        logger.info('{} :: {}'.format(arg_key, arg_value))

    # Resume from previous iteration
    if args.resume_from is not None:
        cfg.resume_from = args.resume_from

    # Update the default number of GPUs if different from config.
    if args.gpu_ids is not None:
        cfg.gpu_ids = args.gpu_ids
    else:
        cfg.gpu_ids = range(1) if args.gpus is None else range(args.gpus)

    # Update the autoscaler with the number of GPUs if changed from default 8 gpus.
    # ECM Modified so it takes into account the actual mini-batch size, which is dependent on the number of gpus and the images per gpu.
    # It now scales with the default mini-batch of 8 gpus and 2 images per gpu (16), and changes in response to changes in both gpus and/or images per gpu.
    if args.autoscale_lr:
        _old_lr = cfg.optimizer['lr']
        # Custom LR modification
        cfg.optimizer['lr'] =cfg.optimizer['lr'] * (len(cfg.gpu_ids) * cfg.data.imgs_per_gpu) / (16)
        # Example LR modification
        #cfg.optimizer['lr'] = cfg.optimizer['lr'] * len(cfg.gpu_ids) / 8
        logger.info('Applying linear Learning Rate correction. LR changed from {} to {}'.format(_old_lr, cfg.optimizer['lr']))

    # init distributed env first, since logger depends on the dist info.
    if args.launcher == 'none':
        logger.info('Distributed environment has not been initialized.')
        distributed = False
    else:
        logger.info('Distributed environment initialised.')
        distributed = True
        init_dist(args.launcher, **cfg.dist_params)

    ## Set workflow overide
    # Just training or validation will be done through the evaluation hook.
    if (args.workflow == 1):# or (args.validate):
        cfg.workflow = [('train', 1)]
        logger.info('Setting workflow to train only {}'.format(cfg.workflow))
    # Run validation through the val() function.
    elif args.workflow == 2:
        cfg.workflow = [('train', 1), ('val', 1)]
        logger.info('Setting workflow to train and validate {}'.format(cfg.workflow))
    # Report if using COCO validation metrics.
    if args.validate:
        logger.info('COCO Validation will be performed after every {} training epoch(s) using metrics {}'.format(cfg.evaluation.interval, cfg.evaluation.metric))

    # init the meta dict to record some important information such as
    # environment info and seed, which will be logged
    meta = dict()
    # log env info
    env_info_dict = collect_env()
    env_info = '\n'.join([('{}: {}'.format(k, v))
                            for k, v in env_info_dict.items()])
    dash_line = '-' * 60 + '\n'
    logger.info('Environment info:\n' + dash_line + env_info + '\n' +
                dash_line)
    meta['env_info'] = env_info

    # log some basic info
    logger.info('Distributed training: {}'.format(distributed))
    logger.info('Config:\n{}'.format(cfg.text))

    # set random seeds
    if args.seed is not None:
        logger.info(f'Set random seed to {args.seed}, '
                    f'deterministic: {args.deterministic}')
        set_random_seed(args.seed, deterministic=args.deterministic)
    cfg.seed = args.seed
    meta['seed'] = args.seed

    model = build_detector(cfg.model, train_cfg=cfg.train_cfg, test_cfg=cfg.test_cfg)

    datasets = [build_dataset(cfg.data.train)] #, build_dataset(cfg.data.val)]

    if len(cfg.workflow) == 2:
        val_dataset = copy.deepcopy(cfg.data.val)
        val_dataset.pipeline = cfg.data.train.pipeline
        datasets.append(build_dataset(val_dataset))
    if cfg.checkpoint_config is not None:
        # save mmdet version, config file content and class names in
        # checkpoints as meta data
        cfg.checkpoint_config.meta = dict(
            mmdet_version=__version__,
            config=cfg.text,
            CLASSES=datasets[0].CLASSES)

    # add an attribute for visualization convenience
    model.CLASSES = datasets[0].CLASSES
    train_detector(
        model,
        datasets,
        cfg,
        distributed=distributed,
        validate=args.validate,
        timestamp=timestamp,
        meta=meta)

if __name__ == '__main__':

    main()
v-qjqs commented 4 years ago

Hi, did you have the correct format of ground-truth mask annotation for evaluation?

ecm200 commented 4 years ago

Hi @v-qjqs,

Thanks for you reply.

The format for the validation dataset is identical to that of the training set, and produced at the same time using the same algorithm.

All checks and tests for quality are performed on both datasets, so my feeling is that if the training is performing correctly, and the loss functions for the different parts of the network are reporting what look to be sensible values, then this indicates that the format is being read correctly by the network during training?

Here are example loss functions for each part of the network for a model that trained well. This produced 0.0 values for SEGM metrics when tested.

image

image

Specifically, the loss_mask is defined for both the training and validation datasets.

image

image

So, given that the validation loss functions are well defined, this then suggests that the format of the label data is correct, doesn't it?

ecm200 commented 4 years ago

Here's an example entry for an object annotation in my JSON file for the validation dataset.

A couple of things I have noticed whilst producing these plots and example annotation file:

  1. As the polygon coordinates have been rounded, there are duplicate points in the polygon list, would this cause issues with the evaluation metrics?

  2. For overlapping particles, I have not occluded the particle mask where overlap occurs. Should the particle masks be modified to remove the parts that are covered by another object?

{'id': 5153,
 'image_id': 25,
 'category_id': 1,
 'bbox': [1052, 258, 19, 15],
 'area': 103.21989078633487,
 'segmentation': [[1071,
   265,
   1071,
   266,
   1070,
   266,
   1069,
   267,
   1069,
   267,
   1068,
   267,
   1068,
   267,
   1067,
   268,
   1067,
   268,
   1066,
   268,
   1065,
   268,
   1065,
   269,
   1064,
   268,
   1063,
   268,
   1061,
   269,
   1061,
   269,
   1060,
   268,
   1059,
   269,
   1059,
   269,
   1058,
   269,
   1057,
   270,
   1057,
   270,
   1056,
   269,
   1055,
   269,
   1055,
   269,
   1054,
   268,
   1054,
   268,
   1055,
   267,
   1054,
   266,
   1054,
   266,
   1053,
   264,
   1053,
   264,
   1053,
   264,
   1054,
   264,
   1054,
   264,
   1056,
   263,
   1056,
   263,
   1057,
   263,
   1058,
   262,
   1058,
   262,
   1059,
   262,
   1059,
   262,
   1060,
   262,
   1061,
   262,
   1062,
   262,
   1062,
   262,
   1063,
   262,
   1064,
   261,
   1064,
   261,
   1065,
   261,
   1066,
   262,
   1068,
   261,
   1068,
   261,
   1069,
   261,
   1070,
   261,
   1070,
   262,
   1069,
   263,
   1070,
   264,
   1070,
   264,
   1071,
   265,
   1071,
   265]],
 'iscrowd': 0}

For completeness, here's an example image from the validation dataset, showing both the bounding boxes and polygons. These have been loaded from the JSON annotations file using the COCO API.

image

image

ecm200 commented 3 years ago

Anybody have ideas what might be causing the 0.0 values in the SEGM metrics?

jessicametzger commented 3 years ago

Very late but I am getting a similar issue when training my bbox-only model. The training loss decays nicely but all evaluation metrics are zero. It goes away when I switch around some operations in the training and testing pipelines, so I believe it's a data pipeline issue (i.e. an inconsistency between the training and testing data format). I opened a new issue here because I couldn't figure out what causes it to work or not work. I couldn't find much documentation on data augmentation that indicated there should be any issue like this.