lhc1224 / OSAD_Net

Pytorch implementation of One-Shot Affordance Detection
MIT License
59 stars 8 forks source link

Questions about dataset files and baselines #2

Open bighuang624 opened 2 years ago

bighuang624 commented 2 years ago

Hi, much thanks for the open-resourced codebase. I would like to ask several questions about the dataset files and segmentation baselines:

  1. I notice that the dataset PADv2 seems to contain several ref files, i.e., there exists refs and refs_2 folders under each train\divide_x folder, and several test_ref_x.txt file under each test\divide_x folder. This makes me very confused because I don't know which file paths I should enter as hyperparameters. Can you please explain the differences between these files and which one should I use if I want to reproduce the experimental results in the paper? Moreover, I suggest providing a README for your proposed datasets to explain the role of each file contained in them, so as to effectively increase the influence of this work.

  2. I want to ask how you adapted conventional segmentation methods (UNet, PSPNet, DeepLabV3+) to the one-shot affordance detection task, as they need segmentation masks as supervision for each category, and these ground-truth masks are not provided for novel affordance categories in this task. I guess that you just ignore the categories and use all ground-truth masks of base affordance categories to train the model, and also ignore the support images and bounding boxes when predicting the masks for query samples of novel affordance categories. However, I am not sure whether my guess is correct, because I did not find the relevant details in the paper.

lhc1224 commented 2 years ago

Hello, thank you very much for your suggestion, the test process just uses the default path in run_osad.py as input directly, ref and ref_2 are annotations for bounding box and mask respectively. I will provide a readme file later to explain the difference between each txt file. Regarding the segmentation network, we output the binarized mask directly, ignoring the affordance category.

lhc1224 commented 2 years ago

The difference between the different txt files is the number of query images. This is the txt file I generated when I did the hyperparameter comparison experiment. The first path corresponds to the support image and the following paths to the query images.