jwyang / JULE.torch

Torch code for our CVPR 2016 paper "Joint Unsupervised LEarning of Deep Representations and Image Clusters"
MIT License
286 stars 82 forks source link

How to build dataset #24

Open dcharua opened 6 years ago

dcharua commented 6 years ago

Hi, I find this paper to be very interesting, Im part of a research program at the University of Barcelona and we want to try this network for Egocentric vision, we have our dataset in a folder structure were each folder represents a category, in our case daily activities. I have no experience working with hd5f files, can you please help me with the code to create the h5 file? Also do you think that JULE will work with egocentric vision?

Thank you for your time and good work

virscience001 commented 6 years ago

If you are using python, you can use below code snippet to create hdf5 file.


import sys, h5py
import cv2, glob
from multiprocessing import Pool
import numpy as np

hf = h5py.File('data4torch.h5', "w")
train_paths = glob.glob('data/class/*.jpeg')

def process_image(impath):
    im = cv2.imread(impath)
    im = cv2.resize(im, (55,55))
    im = im.transpose()
    return im

# Class dictionary
label_dict = {'A': 1, 'B': 2, 'C': 3, 'D': 4, 'E': 5, 'F': 6}
def get_labels(impath):
    label = impath.split('/')[2]
    return label_dict[label]

p = Pool(4) # set this to number of cores you have
data = np.array(p.map(process_image, train_paths))
labels = np.array(p.map(get_labels, train_paths))

hf.create_dataset('data', data=data)
hf.create_dataset('labels', data=labels)
hf.close()
dcharua commented 6 years ago

Thank you so much for your answer, it worked perfectly I now have other problem :( I resized the images to 32 x 32 and used the FRGC structure as suggested in another Issue, but got the same problem

/home/lifelogging/torch/install/bin/luajit: bad argument #2 to '?' (out of range at /home/lifelogging/torch/pkg/torch/generic/Tensor.c:913) stack traceback: [C]: at 0x7f1af1a2bb60 [C]: in function '__index' train.lua:368: in function 'organize_samples' train.lua:424: in function 'opfunc' /home/lifelogging/torch/install/share/lua/5.1/optim/sgd.lua:44: in function 'sgd' train.lua:438: in function 'updateCNN' train.lua:489: in main chunk [C]: in function 'dofile' ...ging/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:150: in main chunk [C]: at 0x00405d50

Any idea what could be causing it ? Thank you very much

dcharua commented 6 years ago

I got it, the problem was the datatype, it needs to be a 32 float, I added the cast to your code data = np.array(p.map(process_image, train_paths)).astype(np.float32) labels = np.array(p.map(get_labels, train_paths)).astype(np.float32)

jwyang commented 6 years ago

@dcharua great!

amir-sha commented 6 years ago

Do you have any solution for creating HDF5 from a set of images? as mentioned in the readme "Create a HDF5 file with the size of NxCxHxW, where N is the total number of images, C is the number of channels, H is the height of the image, and W the width of the image. Then move it to datasets/dataset_name/data4torch.h5" I'm using c++ for creating HDF5 file but I'm not able to understand the procedure, I downloaded the datasets/CMU-PIE/data4torch and open the file but the size doesn't match the description for creating HDF5 it has only two tables "data[28552] and labels[28552]" (and if its an unsupervised method why we have labels ?!) would you please help me to figure out how to create HDF5 from my dataset ?

here is the code I've written so far ...

void CreateHDF5(fs::path dataSetPath)
{

    vector<fs::path> imagesPath;
    getfiles(dataSetPath,".bmp",imagesPath);
    //copy all files to new folder
    fs::path dsPath=curr/"HDF5/dataset";
    fs::path hd5file=curr/"HDF5/Dataset.h5";

   const H5std_string   FILE_NAME(hd5file.string().c_str());
//    const H5std_string    DATASET_NAME(dsPath.string().c_str());
     const H5std_string DATASET_NAME("dset");
    int dsize=4;
    hsize_t dims[dsize];               // dataset dimensions
    const int    RANK = 2;

    fs::create_directories (dsPath);
    ProgressBar progressbar1(imagesPath.size(),"Copy Dataset          ");
    for (int i=0;i<imagesPath.size();i++)
    {
        boost::filesystem::copy_file(imagesPath[i] ,dsPath/imagesPath[i].filename(),fs::copy_option::overwrite_if_exists);
        progressbar1.Progressed(i);
    }
    progressbar1.Progressed(imagesPath.size());
    cerr<<endl;

    // Create a new file using the default property lists.
    H5File dsfile(FILE_NAME, H5F_ACC_TRUNC);

    dims[0] = imagesPath.size();
    dims[1] =getimage(imagesPath[0].string().c_str()).channels() ;
    dims[2] = getimage(imagesPath[0].string().c_str()).rows;
    dims[3] = getimage(imagesPath[0].string().c_str()).cols;
    DataSpace dataspace(RANK, dims);
    // Create the dataset.
    DataSet dataset = dsfile.createDataSet(DATASET_NAME, PredType::STD_I32BE, dataspace);

}