thuml / Transfer-Learning-Library

Transfer Learning Library for Domain Adaptation, Task Adaptation, and Domain Generalization
http://transfer.thuml.ai
MIT License
3.39k stars 553 forks source link

Attribute 'thing_classes' does not exist in the metadata of dataset: metadata is empty. #124

Closed darkhan-s closed 2 years ago

darkhan-s commented 2 years ago

Hello,

I am testing your examples/domain_adaptation/object_detection/d_adapt/d_adapt.py method on my custom dataset (30 classes), which i converted to VOC format. Initially, I trained it on source-only.py successfully, but when trying to run d-adapt.py, I receive the following error.

-- Process 0 terminated with the following error:
Traceback (most recent call last):
  File "/opt/rh/rh-python38/root/usr/local/lib64/python3.8/site-packages/torch/multiprocessing/spawn.py", line 59, in _wrap
    fn(i, *args)
  File "/scratch/project_2005695/detectron2/detectron2/engine/launch.py", line 126, in _distributed_worker
    main_func(*args)
  File "/scratch/project_2005695/Transfer-Learning-Library/examples/domain_adaptation/object_detection/d_adapt/d_adapt.py", line 272, in main
    train(model, logger, cfg, args, args_cls, args_box)
  File "/scratch/project_2005695/Transfer-Learning-Library/examples/domain_adaptation/object_detection/d_adapt/d_adapt.py", line 131, in train
    classes = MetadataCatalog.get(args.targets[0]).thing_classes
  File "/scratch/project_2005695/detectron2/detectron2/data/catalog.py", line 131, in __getattr__
    raise AttributeError(
AttributeError: Attribute 'thing_classes' does not exist in the metadata of dataset '.._datasets_TLESS_real_dataset_trainval': metadata is empty.

I have registered the base class in tllib/vision/datasets/object_detection/__init__.py same way as in the provided CityScapesBase class:

class TLessBase:
    class_names = ('Model 1', 'Model 2', 'Model 3', 'Model 4', 'Model 5',
                'Model 6', 'Model 7', 'Model 8', 'Model 9', 'Model 10', 'Model 11',
                'Model 12', 'Model 13', 'Model 14', 'Model 15', 'Model 16', 'Model 17',
                'Model 18', 'Model 19', 'Model 20', 'Model 21', 'Model 22', 'Model 23',
                'Model 24', 'Model 25', 'Model 26', 'Model 27', 'Model 28', 'Model 29', 'Model 30'
                )

    def __init__(self, root, split="trainval", year=2007, ext='.jpg'):
        self.name = "{}_{}".format(root, split)
        self.name = self.name.replace(os.path.sep, "_")
        if self.name not in MetadataCatalog.keys():
            register_pascal_voc(self.name, root, split, year, class_names=self.class_names, ext=ext,
                                bbox_zero_based=True)
            MetadataCatalog.get(self.name).evaluator_type = "pascal_voc"

And then the target and the test classes inherit from it.

Could you please suggest what I am missing?

darkhan-s commented 2 years ago

This seems to happen if I set --num-gpus anything more than 1.

JunguangJiang commented 2 years ago

D-adapt was not trained with multiple GPUs before, so we didn't find this problem. It seems that moving the following code from if __name__ == "__main__": into the main function will solve the problem.

    args.source = utils.build_dataset(args.source[::2], args.source[1::2])
    args.target = utils.build_dataset(args.target[::2], args.target[1::2])
    args.test = utils.build_dataset(args.test[::2], args.test[1::2])
darkhan-s commented 2 years ago

Hi @JunguangJiang, sorry that I continue on the closed issue, but this is a follow up to the same dataset. I have trained the model as per your recommendation first with the source dataset (50k images) only using examples/domain_adaptation/object_detection/source_only.py.

I run the command as follows: python source_only.py \ --config-file config/faster_rcnn_vgg_16_cityscapes.yaml \ -s TLess datasets/TLESS_rendered_dataset \ -t TLessReal datasets/TLESS_real_dataset \ --test TLessTest datasets/TLESS_rendered_dataset TLessRealTest datasets/TLESS_real_dataset \ --finetune \ OUTPUT_DIR logs/source_only/faster_rcnn_vgg_16_cityscapes/tlessRendered2Real \ MODEL.ROI_HEADS.NUM_CLASSES 30

This gave me a result of AP 59.39 on the source dataset and AP 6.77 on the target. So I decide to test it on the same source training data to see how it was performing using visualize.py:

python visualize.py --config-file config/faster_rcnn_vgg_16_cityscapes.yaml \ --threshold 0.1 \ --test TLess datasets/TLESS_rendered_dataset --save-path visualizations/source_only/tlessRendered2Real \ MODEL.WEIGHTS logs/source_only/faster_rcnn_vgg_16_cityscapes/tlessRendered2Real/model_final.pth

And I get very poor detections even on the same data I was training it on (no detections at all if I set the threshold as 0.5, on average the confidence of the bounding boxes is around 20%). What could be the possible reason for such a big difference in the training metrics and the visualize.py metrics? Could it be so that I have to modify some image dimensions before feeding them to visualize.py?

Edit: both source and target image shapes are (540, 720, 3)

JunguangJiang commented 2 years ago

One possible reason is that you fail to load the model when visualizing the results. A suggestion is that you can use tensorboard to watch the detection results. For instance,

tensorboard --logdir=logs
thucbx99 commented 2 years ago

I guess there might be some issues with the processing of your dataset. You can print some intermediate results, such as instances in line 45 of the visualize.py file, to confirm what the original output of the model is.

darkhan-s commented 2 years ago

Sorry for late response, have been testing other methods.

One possible reason is that you fail to load the model when visualizing the results. A suggestion is that you can use tensorboard to watch the detection results. For instance,

tensorboard --logdir=logs

The problem is I don't have a screen to monitor such results, I am training it on a server.

I guess there might be some issues with the processing of your dataset. You can print some intermediate results, such as instances in line 45 of the visualize.py file, to confirm what the original output of the model is.

This one works as expected: Instances(num_instances=100, image_height=540, image_width=720, fields=[pred_boxes: Boxes(tensor([[5.0100e+02, 1.8443e+02, 5.3791e+02, 2.2163e+02], ...)), scores: tensor([0.2554, ...)), pred_classes: tensor([6, ....])])