xvjiarui / GCNet

GCNet: Non-local Networks Meet Squeeze-Excitation Networks and Beyond
Apache License 2.0
1.19k stars 165 forks source link

Train on custom data #31

Open aravind3134 opened 4 years ago

aravind3134 commented 4 years ago

Hey,

I am trying to train custom data using GCNet. I have the data in COCO data format. I want to know the exact procedure to train it. Because, just running the train.sh script, gives me Index error.

I am changing the config file to make it work, but didn't find any luck with that. Please let me know the fields that should be changed to make it work.

Thanks.

xvjiarui commented 4 years ago

Sorry for the late reply. Could you please provide the error message? The training procedure should be the same as mmdetection.

aravind3134 commented 4 years ago

I tried to run a config file changing the data location.

In my case, the number of classes are only 2. I also have to change the name of the classes. I think I am getting error only because of it.

Please let me know how to do it. What should be changed?

As of now, I get the following index error:

Traceback (most recent call last): Traceback (most recent call last): File "./tools/train.py", line 103, in File "./tools/train.py", line 103, in main() main() File "./tools/train.py", line 99, in main File "./tools/train.py", line 99, in main logger=logger) File "/home/ubuntu/anaconda3/envs/tensorflow_p36/lib/python3.6/site-packages/mmdet-0.6.0+a9fcc88-py3.6.egg/mmdet/apis/train.py", line 60, in train_detector logger=logger) File "/home/ubuntu/anaconda3/envs/tensorflow_p36/lib/python3.6/site-packages/mmdet-0.6.0+a9fcc88-py3.6.egg/mmdet/apis/train.py", line 60, in train_detector _dist_train(model, dataset, cfg, validate=validate) File "/home/ubuntu/anaconda3/envs/tensorflow_p36/lib/python3.6/site-packages/mmdet-0.6.0+a9fcc88-py3.6.egg/mmdet/apis/train.py", line 189, in _dist_train _dist_train(model, dataset, cfg, validate=validate) File "/home/ubuntu/anaconda3/envs/tensorflow_p36/lib/python3.6/site-packages/mmdet-0.6.0+a9fcc88-py3.6.egg/mmdet/apis/train.py", line 189, in _dist_train runner.run(data_loaders, cfg.workflow, cfg.total_epochs) File "/home/ubuntu/anaconda3/envs/tensorflow_p36/lib/python3.6/site-packages/mmcv-0.2.14-py3.6-linux-x86_64.egg/mmcv/runner/runner.py", line 358, in run runner.run(data_loaders, cfg.workflow, cfg.total_epochs) File "/home/ubuntu/anaconda3/envs/tensorflow_p36/lib/python3.6/site-packages/mmcv-0.2.14-py3.6-linux-x86_64.egg/mmcv/runner/runner.py", line 358, in run epoch_runner(data_loaders[i], kwargs) File "/home/ubuntu/anaconda3/envs/tensorflow_p36/lib/python3.6/site-packages/mmcv-0.2.14-py3.6-linux-x86_64.egg/mmcv/runner/runner.py", line 260, in train epoch_runner(data_loaders[i], kwargs) File "/home/ubuntu/anaconda3/envs/tensorflow_p36/lib/python3.6/site-packages/mmcv-0.2.14-py3.6-linux-x86_64.egg/mmcv/runner/runner.py", line 260, in train for i, data_batch in enumerate(data_loader): File "/home/ubuntu/anaconda3/envs/tensorflow_p36/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 582, in next for i, data_batch in enumerate(data_loader): File "/home/ubuntu/anaconda3/envs/tensorflow_p36/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 582, in next return self._process_next_batch(batch) return self._process_next_batch(batch) File "/home/ubuntu/anaconda3/envs/tensorflow_p36/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 608, in _process_next_batch File "/home/ubuntu/anaconda3/envs/tensorflow_p36/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 608, in _process_next_batch raise batch.exc_type(batch.exc_msg) IndexError: Traceback (most recent call last):

Thanks

xvjiarui commented 4 years ago

It seems that there is some problem with your data loader. I suggest you use single process to debug your code, e.g. 1 gpu only, so you could add breakpoint inside your code.

aravind3134 commented 4 years ago

Hey, Can you please tell me the changes required to successfully train a custom data set created in COCO data set format with GCNet?

xvjiarui commented 4 years ago

I think there are two workarounds. Either of them should be fine.

  1. convert your data into exactly the same format as COCO annotation.
  2. follow this to create your own dataset
aravind3134 commented 4 years ago

Hey, I am trying to run my own data in same format as COCO dataset and use one of the configuration files to run training. As my data doesn't have segmantation attribute, I tried to run the my dataset and coco dataset with the setting 'with_mask' as 'False' in the config file. Do I need to change something else in the configuration file to make it work?

I am using the config file in this location: configs/gcnet/r50/mask_rcnn_r50_fpn_2x.py

Error: Traceback (most recent call last): File "./tools/train.py", line 106, in <module> main() File "./tools/train.py", line 101, in main logger=logger) File "/home/ubuntu/anaconda3/envs/tensorflow_p36/lib/python3.6/site-packages/mmdet-0.6.0+a9fcc88-py3.6.egg/mmdet/apis/train.py", line 65, in train_detector _dist_train(model, dataset, cfg, validate=validate) File "/home/ubuntu/anaconda3/envs/tensorflow_p36/lib/python3.6/site-packages/mmdet-0.6.0+a9fcc88-py3.6.egg/mmdet/apis/train.py", line 201, in _dist_train runner.run(data_loaders, cfg.workflow, cfg.total_epochs) File "/home/ubuntu/anaconda3/envs/tensorflow_p36/lib/python3.6/site-packages/mmcv-0.2.14-py3.6-linux-x86_64.egg/mmcv/runner/runner.py", line 361, in run epoch_runner(data_loaders[i], **kwargs) File "/home/ubuntu/anaconda3/envs/tensorflow_p36/lib/python3.6/site-packages/mmcv-0.2.14-py3.6-linux-x86_64.egg/mmcv/runner/runner.py", line 264, in train self.model, data_batch, train_mode=True, **kwargs) File "/home/ubuntu/anaconda3/envs/tensorflow_p36/lib/python3.6/site-packages/mmdet-0.6.0+a9fcc88-py3.6.egg/mmdet/apis/train.py", line 44, in batch_processor losses = model(**data) File "/home/ubuntu/anaconda3/envs/tensorflow_p36/lib/python3.6/site-packages/torch/nn/modules/module.py", line 493, in __call__ result = self.forward(*input, **kwargs) File "/home/ubuntu/anaconda3/envs/tensorflow_p36/lib/python3.6/site-packages/mmcv-0.2.14-py3.6-linux-x86_64.egg/mmcv/parallel/distributed.py", line 50, in forward return self.module(*inputs[0], **kwargs[0]) File "/home/ubuntu/anaconda3/envs/tensorflow_p36/lib/python3.6/site-packages/torch/nn/modules/module.py", line 493, in __call__ result = self.forward(*input, **kwargs) File "/home/ubuntu/anaconda3/envs/tensorflow_p36/lib/python3.6/site-packages/mmdet-0.6.0+a9fcc88-py3.6.egg/mmdet/core/fp16/decorators.py", line 49, in new_func return old_func(*args, **kwargs) File "/home/ubuntu/anaconda3/envs/tensorflow_p36/lib/python3.6/site-packages/mmdet-0.6.0+a9fcc88-py3.6.egg/mmdet/models/detectors/base.py", line 86, in forward return self.forward_train(img, img_meta, **kwargs) File "/home/ubuntu/anaconda3/envs/tensorflow_p36/lib/python3.6/site-packages/mmdet-0.6.0+a9fcc88-py3.6.egg/mmdet/models/detectors/two_stage.py", line 183, in forward_train sampling_results, gt_masks, self.train_cfg.rcnn) File "/home/ubuntu/anaconda3/envs/tensorflow_p36/lib/python3.6/site-packages/mmdet-0.6.0+a9fcc88-py3.6.egg/mmdet/models/mask_heads/fcn_mask_head.py", line 112, in get_target gt_masks, rcnn_train_cfg) File "/home/ubuntu/anaconda3/envs/tensorflow_p36/lib/python3.6/site-packages/mmdet-0.6.0+a9fcc88-py3.6.egg/mmdet/core/mask/mask_target.py", line 10, in mask_target pos_assigned_gt_inds_list, gt_masks_list, cfg_list) TypeError: 'NoneType' object is not iterable Traceback (most recent call last): File "/home/ubuntu/anaconda3/envs/tensorflow_p36/lib/python3.6/runpy.py", line 193, in _run_module_as_main "__main__", mod_spec) File "/home/ubuntu/anaconda3/envs/tensorflow_p36/lib/python3.6/runpy.py", line 85, in _run_code exec(code, run_globals) File "/home/ubuntu/anaconda3/envs/tensorflow_p36/lib/python3.6/site-packages/torch/distributed/launch.py", line 235, in <module> main() File "/home/ubuntu/anaconda3/envs/tensorflow_p36/lib/python3.6/site-packages/torch/distributed/launch.py", line 231, in main cmd=process.args) subprocess.CalledProcessError: Command '['/home/ubuntu/anaconda3/envs/tensorflow_p36/bin/python', '-u', './tools/train.py', '--local_rank=0', 'configs/gcnet/r50/mask_rcnn_r50_fpn_2x.py', '--launcher', 'pytorch']' returned non-zero exit status 1.