lijm48 / CI-SSOD

3 stars 0 forks source link

custom dataset #2

Open xcx121 opened 1 month ago

xcx121 commented 1 month ago

image

Hello, thank you very much for your work! I would like to reproduce your work using a custom dataset. My dataset is in COCO format, structured as shown in the above figure.

Could you please explain how the pseudo_ann_file="${_pseudo_dir}/pseudo.json" in the config is generated? My dataset does not have this file. In the instances_train2017_coco_split${SPLIT}_label.json file, does the "split" refer to dividing the dataset into labeled and unlabeled portions by percentage? Thank you for your response!

您好,非常感谢你们的工作! 我想用自定义数据集复现你们的工作,我的数据集是coco格式,结构如上图所示。 1.请问config中的pseudo_ann_file="${_pseudo_dir}/pseudo.json",pseudo_ann_file文件是如何生成的,我的数据集中没有这个文件; 2.请问instances_train2017_coco_split${SPLIT}_label.json中,这个split是将数据集按百分比分成有标签和无标签吗 感谢您的回复!

lijm48 commented 1 month ago

Thanks for your interest!

  1. Actually, pseudo.json should be automatically created in line 203 in ssod/datasets/coco.py. It will be updated in line 101-135 of ssod/models/soft_teacher_grs.py. It is used to resample the pseudo-labels. Actually, you don't need to create this file, just specify a path because it is automatically created.
  2. Split refers to the different split of base classes and novel classes. In the first split, we select 20 classes shared with Pascal VOC as the minority classes and other classes are set as the majority classes. In the second split, we randomly split 40 classes as minority classes while the other 40 classes were split as majority classes For both the split 1 and split 2, we use about 10% of the coco dataset as labeled samples. 如果你还有其他问题,可以加我的微信:ljm314159265358