google / tirg

deep learning, image retrieval, vision and language
Apache License 2.0
296 stars 85 forks source link

CSS3D dataset only has 6004 train and 6019 test samples #9

Open patrickphat opened 4 years ago

patrickphat commented 4 years ago

Hello, I downloaded the CSS3D from the readme and load the css_toy_dataset_novel2_small.dup.npy as follow:

import numpy as np
data = np.load("../data/css_toy_dataset_novel2_small.dup.npy",allow_pickle=True,encoding="latin1")
data = data.item()

I found that data["train"]["mods"] only contains 6004 samples and that only includes the 2d->3d mods, not 3d->3d mods

lugiavn commented 4 years ago

If you need to parse the data yourself, please take a look at at https://github.com/google/tirg/blob/master/datasets.py#L65 as a reference: (1) it's not 1 to 1 correspondences between this "mods" and the number of training samples (run our code to see the size https://github.com/google/tirg/blob/master/main.py#L125) (2) there is no " 2d->3d mods" or "3d->3d mods", there is just a single one they are both and the same (change the param here to switch https://github.com/google/tirg/blob/master/datasets.py#L150)