Open deltheil opened 8 years ago
@deltheil It could be probably solved by wrapping accUpdateGradParameters
in an if that decides what to do based on some kind of flag (module.shared
or sth like this).
clearState
is meant for tensors / buffers, so this is obviously not the right thing to do (plus we do not want to break the sharing after calling it). Instead skipping it at Module:write(file)
time would be better: this is what you had in mind, right?
No, not really. clearState
is definitely not a good place to fix this.
It changes the behaviour of a function by overwriting it in :share()
, but the same thing could be achieved by adding a boolean flag (say module.sharesParameters
). It could be set by :share()
and at every accUpdateGradParameters
call you would check it and do what's necessary. It's essentially a merge of accUpdateGradParameters
and sharedAccUpdateGradParameters
into one function, with an if
that checks the flag as the first thing.
I see. But such a flag would be captured at serialization time. When you load back the model you don't want it to be set, right?
But as far as I remember parameter sharing is preserved when you serialize your model, so you actually want to have it set. I have quickly checked it and it seems to work for me.
parameter sharing is preserved when you serialize your model
True: thanks to the referenced
mode (enabled by default) and provided you archive both the reference and cloned models. But in some situations (e.g. siamese nets) it could happen that you only save the main model: this is what I had in mind.
I see your point, but I'm not sure if there's any way to predict what the user is willing to do in such situations, and it's not handled by current implementation anyway. I thought that the main problem in this issue was this function override that broke serialization, so I must have misunderstood you 😕
there's any way to predict what the user is willing to do in such situations, and it's not handled by current implementation anyway
Agreed.
The main problem I pointed out is definitely the presence of functions in model serialization. The above topic just came out of this discussion.
Your proposal is definitely a way to tackle this main problem. The only alternative I can think about is filtering the function / restoring it at Module:write
/ Module:read
time.
Yes, that would be another option. However, I'm not sure if this hot patching is a good idea in this case, as it can be easily solved without it. It seems to me that, as long as it's possible, it's better to save objects just as they appear in the program, with no modifications.
i would go with Adam's solution, it is annoying that function serialization is a mess.
True... Yes, the other alternative (patching write/read) is too much of a hack.
@apaszke do you have something ready on this?
@deltheil not yet, but I can make it today
Hm, I'm having some trouble with this thing. Can anyone please remind me why was sharedAccGradParameters
introduced?
Good question indeed. Except that accUpdateGradParameters
restores back the inital gradients after the descent step, it sounds like accUpdateGradParameters
/ sharedAccUpdateGradParameters
achieve exactly the same thing. So what's the rationale? Also: why is it important for some layers like Linear
to force them to use the default method?
Yes, this is exactly why I asked. I've been looking for an edge case that would require updating parameters via grad*, but I couldn't think of one. Linear and these other (3 as far as I remember) modules make it even weirder :confused:
Sharing parameters adds a function on the table, for both instances. If one does not care and save the model, such a function will go in the output serialization (which is not ideal when switching to another LuaJIT version, etc), e.g.:
Of course it's possible to clean it manually before save. But wouldn't it be better if it was handled by
clearState
? Any other idea?