open-mmlab / mmpose

OpenMMLab Pose Estimation Toolbox and Benchmark.
https://mmpose.readthedocs.io/en/latest/
Apache License 2.0
5.38k stars 1.18k forks source link

Concatenating multiple datasets in MMPOSE? #1268

Open YuktiADY opened 2 years ago

YuktiADY commented 2 years ago

Hallo,

I would like to ask that is it possible to concatenate two datasets and train them in MMPOSE ?

Thanks,

ly015 commented 2 years ago

Please refer to #1256

YuktiADY commented 2 years ago

Please refer to #1256

In the build_dataset function variable 'c' is used but its not defined anywhere ?? Because of this I am facing an error while training the model.

ly015 commented 2 years ago

Which c do you mean? Could you please provide detailed error information?

YuktiADY commented 2 years ago

Which c do you mean? Could you please provide detailed error information?

if isinstance(cfg, (list, tuple)): dataset = ConcatDataset([build_dataset(c, default_args) for c in cfg]) Here

I am trying to train the coco dataset and my custom dataset (concatenating both ) so, I did changes in the config as well in builder.py added the code( the code you suggested added that snippet) I am getting the below error.

File "./mmpose/tools/train.py", line 170, in main() File "./mmpose/tools/train.py", line 145, in main datasets = [build_dataset(cfg.data.train)] File "/home/yukti/mmpose/mmpose/mmpose/datasets/builder.py", line 77, in build_dataset dataset = ConcatDataset([build_dataset(c, default_args) for c in cfg]) File "/home/yukti/mmpose/mmpose/mmpose/datasets/builder.py", line 77, in dataset = ConcatDataset([build_dataset(c, default_args) for c in cfg]) File "/home/yukti/mmpose/mmpose/mmpose/datasets/builder.py", line 87, in build_dataset dataset = build_from_cfg(cfg, DATASETS, default_args) File "/home/yukti/mmpose/mmcv/mmcv/utils/registry.py", line 55, in build_from_cfg raise type(e)(f'{obj_cls.name}: {e}') AssertionError: TheodorePlusV2Dataset:

Changes i did in config is added the below . base = ['/home/yukti/mmpose/mmpose/configs/base/datasets/theodore.py'] base = ['/home/yukti/mmpose/mmpose/configs/base/datasets/coco_wholebody.py']

dataset_type = 'TheodorePlusV2Dataset' train=[ dict(type=dataset_type,

ann_file=f'{data_root}/annotations/coco_wholebody_train_v1.0.json',

    ann_file=f'{data_root}/coco_annotations/person_keypoints_train.json',
    #img_prefix=f'{data_root}/train2017/',
    img_prefix=f'{data_root}/train/img_png/',
    data_cfg=data_cfg,
    pipeline=train_pipeline,
    dataset_info={{_base_.dataset_info}}),
    dict(type='TopDownCocoWholeBodyDataset',
    ann_file=f'/mnt/data/yjin/coco/annotations/person_keypoints_train2017.json',
    img_prefix=f'mnt/data/yjin/coco/images/train2017/',
    data_cfg=data_cfg,
    pipeline=train_pipeline,
    dataset_info={{_base_.dataset_info}}),
    ],
ly015 commented 2 years ago

I didn't remember ever suggesting any modification to build.py. If you changed this file, could please provide your code? The code line you quoted does not seem likely to cause a variable-not-defined error.

I am afraid your modified config is invalid because it has two base fields which are both dataset_info. In this case, the fields from these two dataset_info files will cause conflicts, like one overriding another or unexpected merging. You can print out the loaded config and check its content.

YuktiADY commented 2 years ago

I changed in builder.py. Please find the code below:

def _concat_dataset(cfg, default_args=None): types = cfg['type'] ann_files = cfg['ann_file'] img_prefixes = cfg.get('img_prefix', None) dataset_infos = cfg.get('dataset_info', None)

 num_joints = cfg['data_cfg'].get('num_joints', None)
 dataset_channel = cfg['data_cfg'].get('dataset_channel', None)

 datasets = []
 num_dset = len(ann_files)
 for i in range(num_dset):
    cfg_copy = copy.deepcopy(cfg)
    cfg_copy['ann_file'] = ann_files[i]

    if isinstance(types, (list, tuple)):
        cfg_copy['type'] = types[i]
    if isinstance(img_prefixes, (list, tuple)):
        cfg_copy['img_prefix'] = img_prefixes[i]
    if isinstance(dataset_infos, (list, tuple)):
        cfg_copy['dataset_info'] = dataset_infos[i]

    if isinstance(num_joints, (list, tuple)):
        cfg_copy['data_cfg']['num_joints'] = num_joints[i]

    if is_seq_of(dataset_channel, list):
        cfg_copy['data_cfg']['dataset_channel'] = dataset_channel[i]

    datasets.append(build_dataset(cfg_copy, default_args))

 return ConcatDataset(datasets)

def build_dataset(cfg, default_args=None): """Build a dataset from config dict.

Args:
    cfg (dict): Config dict. It should at least contain the key "type".
    default_args (dict, optional): Default initialization arguments.
        Default: None.

Returns:
    Dataset: The constructed dataset.
"""
from .dataset_wrappers import RepeatDataset

if isinstance(cfg, (list, tuple)):
    dataset = ConcatDataset([build_dataset(c, default_args) for c in cfg])
elif cfg['type'] == 'ConcatDataset':
    dataset = ConcatDataset(
        [build_dataset(c, default_args) for c in cfg['datasets']])
elif cfg['type'] == 'RepeatDataset':
    dataset = RepeatDataset(
        build_dataset(cfg['dataset'], default_args), cfg['times'])
elif isinstance(cfg.get('ann_file'), (list, tuple)):
    dataset = _concat_dataset(cfg, default_args)
else:
    dataset = build_from_cfg(cfg, DATASETS, default_args)
return dataset

I tried to give only one _base field but still shows another error. In the config i gave the COCO dataset as well so now how it where it will look for dataset.info for COCO dataset (bcoz i have given only one base field which is for my custom dataset). If you see num_images these are the total images in my custom dataset, that means COCO dataset images are not been concatenated .

I am getting the below:

=> load 158849 samples => num_images: 50000 => load 158849 samples loading annotations into memory... Done (t=5.46s) creating index... index created! Traceback (most recent call last): File "/home/yukti/mmpose/mmcv/mmcv/utils/registry.py", line 52, in build_from_cfg return obj_cls(**args) File "/home/yukti/mmpose/mmpose/mmpose/datasets/datasets/top_down/topdown_coco_wholebody_dataset.py", line 83, in init self.db = self._get_db() File "/home/yukti/mmpose/mmpose/mmpose/datasets/datasets/top_down/topdown_coco_dataset.py", line 100, in _get_db gt_db = self._load_coco_keypoint_annotations() File "/home/yukti/mmpose/mmpose/mmpose/datasets/datasets/top_down/topdown_coco_dataset.py", line 110, in _load_coco_keypoint_annotations gt_db.extend(self._load_coco_keypoint_annotation_kernel(img_id)) File "/home/yukti/mmpose/mmpose/mmpose/datasets/datasets/top_down/topdown_coco_wholebody_dataset.py", line 132, in _load_coco_keypoint_annotation_kernel obj['face_kpts'] + obj['lefthand_kpts'] + KeyError: 'foot_kpts'

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "./mmpose/tools/train.py", line 170, in main() File "./mmpose/tools/train.py", line 145, in main datasets = [build_dataset(cfg.data.train)] File "/home/yukti/mmpose/mmpose/mmpose/datasets/builder.py", line 77, in build_dataset dataset = ConcatDataset([build_dataset(c, default_args) for c in cfg]) File "/home/yukti/mmpose/mmpose/mmpose/datasets/builder.py", line 77, in dataset = ConcatDataset([build_dataset(c, default_args) for c in cfg]) File "/home/yukti/mmpose/mmpose/mmpose/datasets/builder.py", line 87, in build_dataset dataset = build_from_cfg(cfg, DATASETS, default_args) File "/home/yukti/mmpose/mmcv/mmcv/utils/registry.py", line 55, in build_from_cfg raise type(e)(f'{obj_cls.name}: {e}') KeyError: "TopDownCocoWholeBodyDataset: 'foot_kpts'"

Also Please suggest if i am thinking in wrong direction and changes I did in config and builder.py are correct or not ? Bcoz my main goal to concatenate the COCO and my custom dataset and to train them.

ly015 commented 2 years ago

The code you provided seems exactly the same as the code in builder.py in the master. Did you make any modifications to your local code with which you met the error?

As for the config, could you please provide the full content of your config file so it would be easier to locate the problem? From the error information above, it seems that you use the TopDownCocoWholeBodyDataset class to load your own data, but the field 'foot_kpts', which is needed by the dataset class, is not found in your annotation. It indicates that the keypoint definition or data structure in your data is different from it in the COCO Wholebody dataset.

In general, ConcatDataset is for combining multiple datasets with the same annotation format (can be loaded by the same dataset class), or at least with the same format of the pre-processed data sample (different dataset class and/or pipeline, but the DatasetClass.getitem() returns the same data structure). You can double-check if your data meets the requirement..

YuktiADY commented 2 years ago

I got it wasnt the TopDownWholeBodydataset, but ts the TopDownCoco class used to load the data. After concatenating it will load the number of images of my dataset and as well as COCO dataset ?? Does mixing the two datasets improves the performance of the model ??

YuktiADY commented 2 years ago

I trained the hrnet smaller resolution model for 15 epochs and now i want to train this for another 10 epochs. So in my config i gave resume_from = '/home/yukti/mmpose/theodore_2022-04-22/best_AP_epoch_15.pth' ( checkpoint from last training ) but the training doesnt resume or start rather it shows this message.

2022-04-25 15:36:49,579 - mmpose - INFO - workflow: [('train', 1)], max: 10 epochs 2022-04-25 15:36:49,579 - mmpose - INFO - Checkpoints will be saved to /home/yukti/mmpose/theodore_2022-04-25 by HardDiskBackend. INFO:torch.distributed.elastic.agent.server.api:[default] worker group successfully finished. Waiting 300 seconds for other agents to finish. INFO:torch.distributed.elastic.agent.server.api:Local worker group finished (SUCCEEDED). Waiting 300 seconds for other agents to finish /home/yukti/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/distributed/elastic/utils/store.py:71: FutureWarning: This is an experimental API and will be changed in future. "This is an experimental API and will be changed in future.", FutureWarning INFO:torch.distributed.elastic.agent.server.api:Done waiting for other agents. Elapsed: 0.0005908012390136719 seconds {"name": "torchelastic.worker.status.SUCCEEDED", "source": "WORKER", "timestamp": 0, "metadata": {"run_id": "none", "global_rank": 0, "group_rank": 0, "worker_id": "87571", "role": "default", "hostname": "dst-toaster.etit.tu-chemnitz.de", "state": "SUCCEEDED", "total_run_time": 45, "rdzv_backend": "static", "raw_error": null, "metadata": "{\"group_world_size\": 1, \"entry_point\": \"python\", \"local_rank\": [0], \"role_rank\": [0], \"role_world_size\": [1]}", "agent_restarts": 0}} {"name": "torchelastic.worker.status.SUCCEEDED", "source": "AGENT", "timestamp": 0, "metadata": {"run_id": "none", "global_rank": null, "group_rank": 0, "worker_id": null, "role": "default", "hostname": "dst-toaster.etit.tu-chemnitz.de", "state": "SUCCEEDED", "total_run_time": 45, "rdzv_backend": "static", "raw_error": null, "metadata": "{\"group_world_size\": 1, \"entry_point\": \"python\"}", "agent_restarts": 0}}

This is neither an error nor warning . Please tell me how to proceed with this ?