Extra `accUpdateGradParameters` after parameters sharing

deltheil commented 8 years ago

Sharing parameters adds a function on the table, for both instances. If one does not care and save the model, such a function will go in the output serialization (which is not ideal when switching to another LuaJIT version, etc), e.g.:

require 'nn'
x = nn.Linear(1, 1)
print(type(rawget(x, "accUpdateGradParameters"))) -- nil
y = nn.Linear(1, 1)
y:share(x, 'weight')
print(type(rawget(x, "accUpdateGradParameters"))) -- not nil!

Of course it's possible to clean it manually before save. But wouldn't it be better if it was handled by clearState? Any other idea?

apaszke commented 8 years ago

@deltheil It could be probably solved by wrapping accUpdateGradParameters in an if that decides what to do based on some kind of flag (module.shared or sth like this).

deltheil commented 8 years ago

clearState is meant for tensors / buffers, so this is obviously not the right thing to do (plus we do not want to break the sharing after calling it). Instead skipping it at Module:write(file) time would be better: this is what you had in mind, right?

apaszke commented 8 years ago

No, not really. clearState is definitely not a good place to fix this.

It changes the behaviour of a function by overwriting it in :share(), but the same thing could be achieved by adding a boolean flag (say module.sharesParameters). It could be set by :share() and at every accUpdateGradParameters call you would check it and do what's necessary. It's essentially a merge of accUpdateGradParameters and sharedAccUpdateGradParameters into one function, with an if that checks the flag as the first thing.

deltheil commented 8 years ago

I see. But such a flag would be captured at serialization time. When you load back the model you don't want it to be set, right?

apaszke commented 8 years ago

But as far as I remember parameter sharing is preserved when you serialize your model, so you actually want to have it set. I have quickly checked it and it seems to work for me.

deltheil commented 8 years ago

parameter sharing is preserved when you serialize your model

True: thanks to the referenced mode (enabled by default) and provided you archive both the reference and cloned models. But in some situations (e.g. siamese nets) it could happen that you only save the main model: this is what I had in mind.

apaszke commented 8 years ago

I see your point, but I'm not sure if there's any way to predict what the user is willing to do in such situations, and it's not handled by current implementation anyway. I thought that the main problem in this issue was this function override that broke serialization, so I must have misunderstood you 😕

deltheil commented 8 years ago

there's any way to predict what the user is willing to do in such situations, and it's not handled by current implementation anyway

Agreed.

The main problem I pointed out is definitely the presence of functions in model serialization. The above topic just came out of this discussion.

Your proposal is definitely a way to tackle this main problem. The only alternative I can think about is filtering the function / restoring it at Module:write / Module:read time.

apaszke commented 8 years ago

Yes, that would be another option. However, I'm not sure if this hot patching is a good idea in this case, as it can be easily solved without it. It seems to me that, as long as it's possible, it's better to save objects just as they appear in the program, with no modifications.

soumith commented 8 years ago

i would go with Adam's solution, it is annoying that function serialization is a mess.

deltheil commented 8 years ago

True... Yes, the other alternative (patching write/read) is too much of a hack.

@apaszke do you have something ready on this?

apaszke commented 8 years ago

@deltheil not yet, but I can make it today

apaszke commented 8 years ago

Hm, I'm having some trouble with this thing. Can anyone please remind me why was sharedAccGradParameters introduced?

deltheil commented 8 years ago

Good question indeed. Except that accUpdateGradParameters restores back the inital gradients after the descent step, it sounds like accUpdateGradParameters / sharedAccUpdateGradParameters achieve exactly the same thing. So what's the rationale? Also: why is it important for some layers like Linear to force them to use the default method?

apaszke commented 8 years ago

Yes, this is exactly why I asked. I've been looking for an edge case that would require updating parameters via grad*, but I couldn't think of one. Linear and these other (3 as far as I remember) modules make it even weirder :confused:

torch / nn

Extra `accUpdateGradParameters` after parameters sharing #771