Closed byzhang closed 10 years ago
As long as you can express your data as a vector, you should be able to use this library to train a model using your data. However, I will point out that there is no support for sparse matrices in this code at the moment, so you'll need to encode your data using dense vectors, even if only a few of the elements of each vector are nonzero.
Typically when I have a training problem that uses sparse data, I encode each minibatch on-the-fly using the function-passing interface provided by the Dataset
class. For example:
import numpy as np
import theano
import theanets
# assume x is a vector with one entry per training data point.
# each element of x gives the integer index of the single "on"
# bit for that data item, so x represents a one-hot code of our
# dataset, where there are "dim" possible bits per item.
x, dim = load_sparse_data()
e = theanets.Experiment(theanets.Classifier)
def batch():
bs = e.args.batch_size
mini = np.zeros((bs, dim), theano.config.floatX)
# choose a random minibatch of indices from x
idx = np.arange(len(x))
np.random.shuffle(idx)
idx = idx[:bs]
mini[np.arange(bs), x[idx]] = 1.
return mini
e.run(batch, batch)
This is just a sketch, but I hope that helps!
Thanks Leif! It looks pretty good for me to try.
Or can you show me where to extend the code to support it, if not yet?