lisa-lab / pylearn2

Warning: This project does not have any current developer. See bellow.
BSD 3-Clause "New" or "Revised" License
2.75k stars 1.09k forks source link

Adding training cost as default monitor channel #1490

Open alexjc opened 9 years ago

alexjc commented 9 years ago

We've been trying to find a way to get the training cost at a high-level in the code, but it seems that's being discarded and is not available. One way we considered would be to add this to the monitor by default, even when there's no validation dataset.

Would this be possible to add? It'd make integrating pylearn2 within Python code, and getting useful metrics back much easier.

Thanks!

cc: @ssamot

dwf commented 9 years ago

Usually either train_objective or train_nll is what you want, for a classification task.

alexjc commented 9 years ago

Thanks, going to look into it now!

I noticed there are no channels (not even training-cost related) if there's no validation set specified for monitoring. Is that correct?

dwf commented 9 years ago

Monitoring channels are computed with respect to monitoring datasets. If there's no datasets specified for monitoring (doesn't necessarily need to be a validation set, though that's usually the useful thing to monitor) then very little will be monitored, typically.

ssamot commented 9 years ago

Does that mean you need at least two passes to get a cost function value? Shouldn't you be able to get this from just one pass of the training set?

dwf commented 9 years ago

You could get something cost function value like with SGD. But you're updating parameters on each minibatch, and so what you actually could get is an evaluation of the cost before each step. But by the time you've finished a pass through the dataset you've got a potentially very different network than what you used to compute your cost estimate on the first minibatch, and so averaging those per-minibatch costs together gets you something that's potentially quite pessimistic and doesn't really correspond to the training cost from any particular version of the network.

Suffice it to say, that isn't implemented at the moment. I'm not saying it's not potentially a little bit useful, but you might get a more reliable estimate of the objective function with a monitoring dataset that consists of a larger-than-minibatch-but-still-smaller-than-whole-dataset sample from your training set: e.g. the equivalent of 10-20 minibatches, but all evaluated with the same model parameters.

ssamot commented 9 years ago

I'm not saying it's not potentially a little bit useful

I understand the argument about getting a pessimistic version of the cost value (as you average over mini-batches during learning), but you can use this over training epochs to a get a somewhat "free" indication of where/if you are going anywhere. Yes, it's not perfect, but computation-wise it should be close to nill, right?