Reproduce results on the SUN dataset

Chen-Song commented 5 years ago

Hi, I run your code but get unsatisfactory result. In your paper, the results on SUN dataset are 38.1, 89.9, 53.4, but in the process of running your code, the results are 8.5, 89.9, 15.6. Maybe I ignored some important information? Look forward to your reply. Screenshot from 2019-09-03 10-30-23

Chen-Song commented 5 years ago

Screenshot from 2019-09-03 21-35-53 The results on CUB dataset are lower than your results, but the seen result is exactly the same as your seen result. But I have not rewritten your code at all. So this is a bit strange.

jwliu-cc commented 4 years ago

Hello, may I ask whitch dataset you did used? I have downloaded the dataset but there is no train_loc and val_loc in the proposed split, and then I extract them from trainval_loc according to trainclasses.txt and valclasses.txt. But the result is not so well at seen classes whitch is 0.26 on CUB dataset, while that of unseen is better, whitch is 0.34.

lerndeep commented 4 years ago

@Carey-cc could you provide an idea about how can I extract trainval_loc and val_loc?

lerndeep commented 4 years ago

@Carey-cc how can I prepare data for the cub dataset for training network?

jwliu-cc commented 4 years ago

@lerndeep

you can extract train_loc and val_loc accrording to the trainclasses.txt and valclasses.txt. python code: `def load_data(att_path, res_path, val_size, trainval_split=1): """ :param val_size: if there is no train_loc and val_loc, even without trainclasses.txt and valclasses.txt, random split train and val :param trainval_split: if there is no train_loc and val_loc, this parameter decide use which txt file. :return: """ att_feats_dat = sio.loadmat(str(att_path)) res_feats_dat = sio.loadmat(str(res_path))

features = res_feats_dat['features'].transpose() features /= np.max(features, axis=0) labels = res_feats_dat['labels'].squeeze().astype(int) - 1 allclasses_names = att_feats_dat['allclasses_names'].squeeze() allclasses_names = np.array([i[0] for i in allclasses_names])

att_feats = att_feats_dat['att'].transpose()

try: id_train = att_feats_dat['train_loc'+str(trainval_split)].squeeze() - 1 id_val = att_feats_dat['val_loc'+str(trainval_split)].squeeze() - 1 train_class = np.unique(labels[id_train]) val_class = np.unique(labels[id_val]) except KeyError:

if there is only trainval_loc but no train_loc and val_oc
```
id_trainval = att_feats_dat['trainval_loc'].squeeze() - 1
labels_trainval = labels[id_trainval]
try:
    # extract train_loc and val_loc according to trainclasses.txt and valclasses.txt
    path = os.path.abspath(os.path.dirname(att_path))
    trainclasses_names = np.loadtxt(path + '/trainclasses'+str(trainval_split)+'.txt', dtype=str)
    valclasses_names = np.loadtxt(path + '/valclasses'+str(trainval_split)+'.txt', dtype=str)
    train_class = np.where(pd.Index(trainclasses_names).get_indexer(allclasses_names) >= 0)[0]
    val_class = np.where(pd.Index(valclasses_names).get_indexer(allclasses_names) >= 0)[0]
except OSError:
    trainval_class = np.unique(labels_trainval)
    train_class, val_class = train_test_split(trainval_class, test_size=val_size, random_state=7)
# extract train_loc and val_loc from trainval_loc
# change labels_trainval to labels if you want to extract train_loc and val_loc from whole dataset
id_train = id_trainval[np.where(pd.Index(pd.unique(train_class)).get_indexer(labels_trainval) >= 0)[0]]
id_val = id_trainval[np.where(pd.Index(pd.unique(val_class)).get_indexer(labels_trainval) >= 0)[0]]
print('train classes num: ', len(train_class))
print('val classes: \r\n', allclasses_names[val_class])
```
id_test_unseen = att_feats_dat['test_unseen_loc'].squeeze() - 1

try: id_test_seen = att_feats_dat['test_seen_loc'].squeeze() - 1 except KeyError: id_test_seen = None

num_class = att_feats.shape[0]

test_class = np.unique(labels[id_test_unseen])

if id_test_seen is not None: test_class_s = np.unique(labels[id_test_seen]) else: test_class_s = []

train_x = features[id_train] train_y = labels[id_train] train_data = list(zip(train_x, train_y))

val_x = features[id_val] val_y = labels[id_val] val_data = list(zip(val_x, val_y))

test_x = features[id_test_unseen] test_y = labels[id_test_unseen] test_data = list(zip(test_x, test_y))

if id_test_seen is not None: test_s_x = features[id_test_seen] test_s_y = labels[id_test_seen] test_data_s = list(zip(test_s_x, test_s_y)) print("test seen", len(test_s_y), len(np.unique(test_s_y))) else: test_data_s = []

class_label = dict() class_label['train'] = list(train_class) class_label['val'] = list(val_class) class_label['test'] = list(test_class) class_label['test_s'] = list(test_class_s) class_label['num_class'] = num_class return att_feats, train_data, val_data, test_data, test_data_s, class_label, allclasses_names`

you can also do this by MATLAB.

you can just download the dataset from https://www.dropbox.com/sh/btoc495ytfbnbat/AAAaurkoKnnk0uV-swgF-gdSa?dl=0(provided by https://github.com/edgarschnfld/CADA-VAE-PyTorch). But this spilt is kind of strange. It extract train_loc and val_loc from the whole dataset.

Programmergg commented 4 years ago

Hello, when I reproduce the cub, why is the output result always 0, and the result of awa is not clear. Can you provide the model you have trained?

stevehuanghe / GDAN

Reproduce results on the SUN dataset #3

if there is only trainval_loc but no train_loc and val_oc