nicholas-leonard / equanimity

Experimental research for distributed conditional computation
4 stars 0 forks source link

Dataset and Preprocessing #14

Closed nicholas-leonard closed 10 years ago

nicholas-leonard commented 11 years ago

Model datasets and preprocessing as in Pylearn2.

Allow for caching (saving to disk) of proprocessed datasets, for later loading (later use).

Dont eliminate the TableDataset idea, its a good one. It indexes all elements of a data table to a table of batches of elements, where each batch of element is a tensor.

Find a way to implement multinomial sampling, where each example has a probability of being sampled, and where this probability is somewhat proportional to the cost (error) of the example. This would require linking the training criteria to the dataset.

Move this functionality from torch-datasets fork to equanimity.

nicholas-leonard commented 10 years ago

DataTensor, DataSet, DataSource, Preprocess, and Sampler. Still needs more testing, more preprocessors (now only Standardize, Pipeline and Binarize are implemented.