mila-iqia / blocks

A Theano framework for building and training neural networks
Other
1.16k stars 351 forks source link

Refactor `RemoveNotFinite` to Remember If There Were NaNs #162

Open rizar opened 9 years ago

rizar commented 9 years ago

There should be a possibility to avoid updating the parameter with NaN gradient.

rizar commented 9 years ago

See discussion in #343.

bartvm commented 9 years ago

http://deeplearning.net/software/pylearn2/library/devtools.html#module-pylearn2.devtools.nan_guard On Feb 25, 2015 2:40 PM, "Dmitry Bogdanov" notifications@github.com wrote:

See discussion in #343 https://github.com/bartvm/blocks/pull/343.

— Reply to this email directly or view it on GitHub https://github.com/bartvm/blocks/issues/162#issuecomment-76039136.

bartvm commented 9 years ago

And if you want to avoid updating instead of stopping, the new plugin by @memimo does that, no? On Feb 25, 2015 2:40 PM, "Dmitry Bogdanov" notifications@github.com wrote:

See discussion in #343 https://github.com/bartvm/blocks/pull/343.

— Reply to this email directly or view it on GitHub https://github.com/bartvm/blocks/issues/162#issuecomment-76039136.

rizar commented 9 years ago

I want to avoid updating but raise an exception or warning. The new step rule just hides the NaNs.

dwf commented 9 years ago

Relevant: http://deeplearning.net/software/pylearn2/library/devtools.html#pylearn2.devtools.nan_guard.NanGuardMode

bartvm commented 9 years ago

The new step rule avoids updating of you set the scaling to 1, so that + NanGuardMode might be enough?

dmitriy-serdyuk commented 9 years ago

I don't like that by default blocks overrides the model file with parameters corrupted with NaNs and I have to retrain the model from scratch. So throwing an exception seems reasonable in some cases.

dmitriy-serdyuk commented 9 years ago

Is it better to thow an exception in Dump class when trying to serialize NaN?

dwf commented 9 years ago

You will probably want to be very careful about that to make sure you avoid partially serializing something before you throw that exception.

dmitriy-serdyuk commented 9 years ago

@dwf Yes, sure. It can be solved with a temp file.

rizar commented 9 years ago

While this is true that a temp file would do the job, I feel it much more naturally to handle these things in the algorithm. This way when you get NaNs an exception will be raised and the main loop from the point just before you got NaN will be saved in a separate file.

rizar commented 9 years ago

Is the NanGuardMode as fast as fast_run?

bartvm commented 9 years ago

Theoretically, no solution is ever going to be as fast as not checking, right? Even if you could optimize the hell out of it, at the very least you still need to make an element-wise comparison for each parameter which you otherwise wouldn't have to.

You won't want to handle this for each possible algorithm, but you can probably add it to the MainLoop. If the error occurs, catch it, print an informative warning, and go straight to after_training extensions (instead of on_error ones).

rizar commented 9 years ago

I did not mean "exactly as fast" as fast_run, I meant that I would not like to break optimizations or use less optimization compared to fast_run.

bartvm commented 9 years ago

I think we'll have to ask @nouiz about the details, but as far as I can tell from the code, Pylearn2's NaNGuard compiles with the same mode, so optimizations should still be applied. My best guess is that the NaN checks occur in between the ops of the final, optimized graph. That's just an informed guess though :)

rizar commented 9 years ago

Anyway, I think I know a super easy solution! We should simply add a shared variable to RemoveNotFinite which would be 1 if there were infinite elements in the step and 0 if there weren't. The user can do whatever he wants with it. This is very Block-ish, I think.

bartvm commented 9 years ago

Sounds good to me, but just to clarify: This does mean that the user won't be able to retrieve the state of the parameters from right before NaN occurred, right? First the rescaling is applied to those shared variables whose updates contained NaN, and only after that the user gets a chance in the after_batch extensions to check whether a scaled update occurred because of a NaN.

My guess is that this is fine. However, in practice NaNs might occur in the same cases where other shared variables are updated with finite but extremely large steps, meaning that you would still have a model in a bad state. However, at least it won't contain NaNs :)

rizar commented 9 years ago

By choosing the right scaling constant the parameters that would contain NaNs will be not corrupted. Yours is a good point, that other parameters might be damaged. I think we should also add an option to this step rule to operate globally on all parameters, that is if any parameter has NaN, all the parameters are scaled instead of being updated. It seems to me that this is the solution, right?

bartvm commented 9 years ago

Yeah, that sounds easy enough, just do something like tensor.any([tensor.any(tensor.or_(tensor.isnan(step), tensor.isinf(step)) for step in steps]) -> scale all parameters (with 1.0 potentially)

rizar commented 9 years ago

Okay, then this ticket becomes a CCW level refactoring of RemoveNotFinite.

nouiz commented 9 years ago

If you need me input, come see me.

On Thu, Feb 26, 2015 at 9:13 AM, Dmitry Bogdanov notifications@github.com wrote:

Okay, then this ticket becomes a CCW level refactoring of RemoveNotFinite.

— Reply to this email directly or view it on GitHub https://github.com/bartvm/blocks/issues/162#issuecomment-76184263.

rizar commented 9 years ago

By the way, documentation RemoveNotFinite seems misleading to me. It is claimed that "parameters are replaced with scaled version", which is in the end true, but the thing is that the scaling coefficient is 1 - self.scaler, not self.scaler!