open-mmlab / mmdetection

OpenMMLab Detection Toolbox and Benchmark
https://mmdetection.readthedocs.io
Apache License 2.0
29.16k stars 9.39k forks source link

'DetDataSample' object has no attribute '_gt_sem_seg' #10613

Open atamazian opened 1 year ago

atamazian commented 1 year ago

Describe the bug I'm getting'DetDataSample' object has no attribute '_gt_sem_seg' error when I try to train my model.

Reproduction

  1. What command or script did you run?
./mmdetection/tools/dist_train.sh custom_config.py 2

I used slightly modified htc_x101-64x4d-dconv-c3-c5_fpn_ms-400-1400-16xb1-20e_coco.py config for an instance segmentation task

  1. Did you make any modifications on the code or config? Did you understand what you have modified?

I changed num_classes, train/val/test datasets and pipelines. Nothing special, just to tune for the task

Environment

sys.platform: linux
Python: 3.10.10 | packaged by conda-forge | (main, Mar 24 2023, 20:08:06) [GCC 11.3.0]
CUDA available: True
numpy_random_seed: 2147483648
GPU 0,1: Tesla T4
CUDA_HOME: /usr/local/cuda
NVCC: Cuda compilation tools, release 11.8, V11.8.89
GCC: gcc (Ubuntu 11.3.0-1ubuntu1~22.04) 11.3.0
PyTorch: 1.12.1+cu116
PyTorch compiling details: PyTorch built with:
  - GCC 9.3
  - C++ Version: 201402
  - Intel(R) Math Kernel Library Version 2020.0.0 Product Build 20191122 for Intel(R) 64 architecture applications
  - Intel(R) MKL-DNN v2.6.0 (Git Hash 52b5f107dd9cf10910aaa19cb47f3abf9b349815)
  - OpenMP 201511 (a.k.a. OpenMP 4.5)
  - LAPACK is enabled (usually provided by MKL)
  - NNPACK is enabled
  - CPU capability usage: AVX2
  - CUDA Runtime 11.6
  - NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,code=sm_86
  - CuDNN 8.3.2  (built against CUDA 11.5)
  - Magma 2.6.1
  - Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=11.6, CUDNN_VERSION=8.3.2, CXX_COMPILER=/opt/rh/devtoolset-9/root/usr/bin/c++, CXX_FLAGS= -fabi-version=11 -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_KINETO -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -DEDGE_PROFILER_USE_KINETO -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-unused-parameter -Wno-unused-function -Wno-unused-result -Wno-unused-local-typedefs -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Werror=cast-function-type -Wno-stringop-overflow, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_VERSION=1.12.1, USE_CUDA=ON, USE_CUDNN=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=OFF, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON, USE_ROCM=OFF, 

TorchVision: 0.13.1+cu116
OpenCV: 4.7.0
MMEngine: 0.7.4
MMDetection: 3.1.0+

Error traceback

Traceback (most recent call last):
  File "/kaggle/working/./mmdetection/tools/train.py", line 133, in <module>
    main()
  File "/kaggle/working/./mmdetection/tools/train.py", line 129, in main
    runner.train()
  File "/opt/conda/lib/python3.10/site-packages/mmengine/runner/runner.py", line 1721, in train
    model = self.train_loop.run()  # type: ignore
  File "/opt/conda/lib/python3.10/site-packages/mmengine/runner/loops.py", line 96, in run
    self.run_epoch()
  File "/opt/conda/lib/python3.10/site-packages/mmengine/runner/loops.py", line 112, in run_epoch
    Traceback (most recent call last):
  File "/kaggle/working/./mmdetection/tools/train.py", line 133, in <module>
self.run_iter(idx, data_batch)    main()
  File "/kaggle/working/./mmdetection/tools/train.py", line 129, in main
    runner.train()
  File "/opt/conda/lib/python3.10/site-packages/mmengine/runner/runner.py", line 1721, in train

  File "/opt/conda/lib/python3.10/site-packages/mmengine/runner/loops.py", line 128, in run_iter
    outputs = self.runner.model.train_step(
  File "/opt/conda/lib/python3.10/site-packages/mmengine/model/wrappers/distributed.py", line 121, in train_step
    losses = self._run_forward(data, mode='loss')
    model = self.train_loop.run()  # type: ignore
  File "/opt/conda/lib/python3.10/site-packages/mmengine/runner/loops.py", line 96, in run
    self.run_epoch()
  File "/opt/conda/lib/python3.10/site-packages/mmengine/runner/loops.py", line 112, in run_epoch
    self.run_iter(idx, data_batch)
  File "/opt/conda/lib/python3.10/site-packages/mmengine/runner/loops.py", line 128, in run_iter
    outputs = self.runner.model.train_step(
  File "/opt/conda/lib/python3.10/site-packages/mmengine/model/wrappers/distributed.py", line 121, in train_step
    losses = self._run_forward(data, mode='loss')
  File "/opt/conda/lib/python3.10/site-packages/mmengine/model/wrappers/distributed.py", line 161, in _run_forward
    results = self(**data, mode=mode)
  File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/torch/nn/parallel/distributed.py", line 1008, in forward
    output = self._run_ddp_forward(*inputs, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/torch/nn/parallel/distributed.py", line 969, in _run_ddp_forward
  File "/opt/conda/lib/python3.10/site-packages/mmengine/model/wrappers/distributed.py", line 161, in _run_forward
    return module_to_run(*inputs[0], **kwargs[0])
  File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "/kaggle/working/mmdetection/mmdet/models/detectors/base.py", line 92, in forward
    return self.loss(inputs, data_samples)
  File "/kaggle/working/mmdetection/mmdet/models/detectors/two_stage.py", line 190, in loss
    roi_losses = self.roi_head.loss(x, rpn_results_list,
  File "/kaggle/working/mmdetection/mmdet/models/roi_heads/htc_roi_head.py", line 288, in loss
    gt_semantic_segs = [
  File "/kaggle/working/mmdetection/mmdet/models/roi_heads/htc_roi_head.py", line 289, in <listcomp>
    data_sample.gt_sem_seg.sem_seg
  File "/kaggle/working/mmdetection/mmdet/structures/det_data_sample.py", line 213, in gt_sem_seg
    return self._gt_sem_seg
AttributeError: 'DetDataSample' object has no attribute '_gt_sem_seg'. Did you mean: 'gt_sem_seg'?
    results = self(**data, mode=mode)
  File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/torch/nn/parallel/distributed.py", line 1008, in forward
    output = self._run_ddp_forward(*inputs, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/torch/nn/parallel/distributed.py", line 969, in _run_ddp_forward
    return module_to_run(*inputs[0], **kwargs[0])
  File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "/kaggle/working/mmdetection/mmdet/models/detectors/base.py", line 92, in forward
    return self.loss(inputs, data_samples)
  File "/kaggle/working/mmdetection/mmdet/models/detectors/two_stage.py", line 190, in loss
    roi_losses = self.roi_head.loss(x, rpn_results_list,
  File "/kaggle/working/mmdetection/mmdet/models/roi_heads/htc_roi_head.py", line 288, in loss
    gt_semantic_segs = [
  File "/kaggle/working/mmdetection/mmdet/models/roi_heads/htc_roi_head.py", line 289, in <listcomp>
    data_sample.gt_sem_seg.sem_seg
  File "/kaggle/working/mmdetection/mmdet/structures/det_data_sample.py", line 213, in gt_sem_seg
    return self._gt_sem_seg
AttributeError: 'DetDataSample' object has no attribute '_gt_sem_seg'. Did you mean: 'gt_sem_seg'?
Traceback (most recent call last):
  File "/opt/conda/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/opt/conda/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/opt/conda/lib/python3.10/site-packages/torch/distributed/launch.py", line 193, in <module>
    main()
  File "/opt/conda/lib/python3.10/site-packages/torch/distributed/launch.py", line 189, in main
    launch(args)
  File "/opt/conda/lib/python3.10/site-packages/torch/distributed/launch.py", line 174, in launch
    run(args)
  File "/opt/conda/lib/python3.10/site-packages/torch/distributed/run.py", line 752, in run
    elastic_launch(
  File "/opt/conda/lib/python3.10/site-packages/torch/distributed/launcher/api.py", line 131, in __call__
    return launch_agent(self._config, self._entrypoint, list(args))
  File "/opt/conda/lib/python3.10/site-packages/torch/distributed/launcher/api.py", line 245, in launch_agent
    raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError: 
============================================================
./mmdetection/tools/train.py FAILED
------------------------------------------------------------
Failures:
[1]:
  time      : 2023-07-10_07:06:07
  host      : b15f8389c57e
  rank      : 1 (local_rank: 1)
  exitcode  : 1 (pid: 110)
  error_file: <N/A>
  traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
------------------------------------------------------------
Root Cause (first observed failure):
[0]:
  time      : 2023-07-10_07:06:07
  host      : b15f8389c57e
  rank      : 0 (local_rank: 0)
  exitcode  : 1 (pid: 109)
  error_file: <N/A>
  traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
============================================================
fzoric8 commented 1 year ago

Hi, I think you need to add with_sem=True to your LoadAnnotations line in the data processing pipeline cfg file.

For some reason, when you're using mask_rcnn it is needed to add with_mask=True, and in this case, it is not sufficient, so it's best to add: with_seg=True.

For example, my coco-instance.py is:

train_pipeline = [
    dict(type=LoadImageFromFile, backend_args=backend_args),
    dict(type=LoadAnnotations, with_bbox=True, with_mask=True, with_seg=True),
    dict(type=Resize, scale=(1333, 800), keep_ratio=True),
    dict(type=RandomFlip, prob=0.5),
    dict(type=PackDetInputs)
]
atamazian commented 1 year ago

I've added with_seg=True to my train_pipeline's LoadAnnotations, but I'm still getting that error.

fx568000 commented 1 year ago

Is there any answer, I also have this problem

bpmsilva commented 1 year ago

del cfg.model.roi_head.semantic_roi_extractor del cfg.model.roi_head.semantic_head

Solacex commented 11 months ago

I met the same error, any update on this?

pkyzh2006 commented 10 months ago

semantic

works for me

GeorgePearse commented 9 months ago

I've hit the same error for "detectors" models (similar to the htc range)

GeorgePearse commented 9 months ago

To me this looks like an actual bug, where it's trying to access an attribute that never exists (e.g. a naming error), no underscore in the true value.

image

Boat-sky commented 6 months ago

del cfg.model.roi_head.semantic_roi_extractor del cfg.model.roi_head.semantic_head

work for me, thanks! and del cfg.model.roi_head.mask_roi_extractor del cfg.model.roi_head.mask_head if you want to create object detection model only

RainyBlue111 commented 5 months ago

del cfg.model.roi_head.semantic_roi_extractor del cfg.model.roi_head.semantic_head

为我工作,谢谢!和 del cfg.model.roi_head.mask_roi_extractor del cfg.model.roi_head.mask_head(如果只想创建对象检测模型)

What should I do to use this method to solve the problem? Can you help me

BlackWhitebzl commented 5 months ago

我同样遇到了这个问题,后面阅读HTC论文和MMDetection的指导文档中“数据集准备”中发现,需要我们在COCO格式数据集基础上,提供一个stuffthingmaps文件夹,包含对应jpg图像的png语义分割标签。 并在train_dataloader(和val_dataloader)的data_prefix=dict(img='train/', seg='train_png/')处补充“seg=‘your_path’”。

不确定是否与你的情况一致,希望可以帮助到你。

I also encountered this problem, after reading the HTC paper and MMDetection's guidance document in the “dataset preparation”, I found that we need to provide a stuffthingmaps folder based on the COCO format dataset, which contains the semantic segmentation pngs of the corresponding jpgs. And add “seg=‘your_path’” at data_prefix=dict(img='train/', seg='train_png/') of 'train_dataloader' (also 'val_dataloader').

Not sure if this agrees with your situation, hope this helps.

manaswakchaure commented 4 months ago

To resolve this issue, create a new file containing the contents of ./configs/detectors/detectors_htc-r50_1x_coco.py. Replace the line base = ../htc/htc_r50_fpn_1x_coco.py with base = ../htc/htc-without-semantic_r50_fpn_1x_coco.py. Use this new file as a _base for custom config.

This is for instance segmentation using custom data.

Note that ../htc/htc_r50_fpn_1x_coco.py includes semantic segmentation parameters and COCO-specific settings only.