Cross-validation - Githubissues

goodfeli commented 12 years ago

Sprint assignees:

Caglar
Raoul

goodfeli commented 12 years ago

The first thing to do is to add support for monitoring both (a stochastic sample of) the training set and a validation set to monitor.py. This implies

figuring out an easy way to specify the extended parameters of monitoring (how many training set batches, how many samples per batch, how many validation set batches, how many samples per batch.
Stopping criteria need to support these new multi-channel monitors.

goodfeli commented 12 years ago

The second thing would be an "outer loop" class that runs a TrainingAlgorithm multiple times with different values of the hyperparameters. Random Search is probably easiest, followed by grid search (look at itertools module to make looping simple).

goodfeli commented 12 years ago

The third thing would be adding support to splitting a single dataset into training and validation folds. This may have to be fairly specific to individual dataset classes but could be made more general by a well-thought-out addition to the dataset API.

dwf commented 12 years ago

@caglar Can you update this ticket with the current status?

caglar commented 12 years ago

Well, my crossvalidation class is almost finished, HoldoutCrossValidation seems to work, KFoldCrossValidation has a few issues, I haven't updated the code in the remote with my latest local changes but you can have look at the recent state of cross-validation code here:

https://github.com/caglar/pylearn/blob/feature/crossval_support/pylearn2/crossval/crossval.py

I'm also trying to add a new reset_params() method to the model and its children classes, to avoid unnecessary object creation at each fold.

I've edited dense_design_matrix and added merge_datasets method, but instead I've decided to use numpy.concetanate function. I've to edit some other files that I don't remember right know. But I'll inform you about the status tomorrow.

I think Raul completed the hyper-parameter search using my crossvalidation class. But he is still working on multi-monitor staff. I'm not sure about the latest status of his monitor work. Raul talked with Ian and he told him that multi-monitor support is a separate job that is needed for early-stopping criteria.

I push my codes to here after refactoring, commenting it and fixing some points.

chandiar commented 12 years ago

Hi,

I finished the random and grid search implementation that makes use of Caglar cross-validation class. You can take a look to my code, here: http://tinyurl.com/89qs73p

I tested the search algos with the Holdout crossvalidation and I had no problems with it. When the kfold crossvalidation will be fixed, I will tested with the search algos.

As for the first part of this ticket: adding support to monitor.py for monitoring more than one datasets at the same time, Ian told me that this feature was not necessary for implementing crossvalidation but it would be good to implement it in case early-stopping will be implemented in the near future. Thus when the cross-validation ticket will be finished, I will work on adding support for monitoring more than one datasets at the same time.

However, in the first part of the ticket, they ask also to for stopping criteria to support multi-channel monitors and this I implemented it. I will push my code by Friday after adding some comments and making sure everything is working fine with the latest updates in pylearn2.

lamblin commented 11 years ago

Apparently, @chandiar 's changes did never get merged into Pylearn2. Should we try and bring that back to life?

lisa-lab / pylearn2

Cross-validation #49