Closed navta closed 7 years ago
This should be achievable with a Graph model: http://keras.io/models/#graph
I tried Graph model with the following code, but didn't successful
shareddense = Dense(748,10)
graph = Graph()
graph.add_input(name='input1', ndim=2)
graph.add_input(name='input2', ndim=2)
graph.add_node(shareddense, name='dense1', input='input1')
graph.add_node(shareddense, name='dense2', input='input2')
graph.add_output(name='output1', input='dense1')
graph.add_output(name='output2', input='dense2')
graph.compile('rmsprop', {'output1':'mse','output2':'mse'})
The error said:
ValueError: ('this shared variable already has an update expression', (dense2_W, GpuFromHost.0))
How can I use shared node on Graph?
I'll look into it.
Having thought about it, I think doing something like that is conceptually problematic. If two nodes are processing different data streams, then they are not the same node. So what you want is weight mirroring, not node duplication. Right?
Yes, that is correct. Is there a way to do the weight mirroring?
Yes, that is correct. Is there a way to do the weight mirroring?
There is no built-in way to do it (at this point). One hack to achieve it would be to do batch by batch training and manually set the weights of the second "shared" layer after each batch.
graph = Graph()
graph.add_input(name='input1', ndim=2)
graph.add_input(name='input2', ndim=2)
graph.add_node(Dense(748, 10), name='dense1', input='input1')
graph.add_node(Dense(748, 10), name='dense2', input='input2')
graph.add_output(name='output', inputs=['dense1', 'dense2'])
graph.compile('rmsprop', {'output':'mse'})
for X1_batch, X2_batch, y_batch in generator():
loss = graph.train_on_batch({'input1':X1_batch, 'input2':X2_batch, 'output':y_batch})
graph.nodes['input2'].set_weights(graph.nodes['input1'].get_weights())
(untested)
Note that maybe a more elegant way to do it would be to replace the weights in both layers with the average of both weight matrices (again, after each batch).
If this solved your problem, I'll close the issue.
Sorry for the delay. The problem is that I need to keep the output separated, so I can jointly trained the network when I add more nodes on top of it. For example, a network like the following:
shareddense = Dense(748,10)
graph = Graph()
graph.add_input(name='input1', ndim=2)
graph.add_input(name='input2', ndim=2)
graph.add_node(shareddense, name='dense1', input='input1')
graph.add_node(shareddense, name='dense2', input='input2')
graph.add_node(Dense(20,20), name='dense3', inputs=['dense1','dense2'],merge_mode='concat')
graph.add_output(name='output', input='dense3')
graph.compile('rmsprop', 'output':'mse')
is there no way he can just use a convolution layer and then flatten the output? If it's weight sharing you're after convolution should do it. I think there was 1d convolution somewhere.
Can you elaborate? I think I need something similar and can't figure out how to do it.
Ideally I want to take a k_N-dimensional input vector, split it into k length N pieces, apply an Nxm matrix A to each piece individually, and then concatenate to get a length k_m vector, where the caveat is that I want the matrix A to be learned by backpropagation, rather than some larger (kN)*(km) matrix. Perhaps this is achievable by convolution layers but I'm very new to this game so any advice would be great.
I too need this kind of parallel/weight sharing model.
I've managed to get something working based on the suggestion above of simply averaging weights after each batch. I couldn't get it working in batch by batch training but have had some success implementing a callback:
class WeightSharing(Callback):
def __init__(self, shared):
self.shared = shared
super(WeightSharing, self).__init__()
def on_batch_end(self, batch, logs={}):
weights = numpy.mean([self.model.nodes[n].get_weights() for n in self.shared],axis=0)
for n in self.shared:
self.model.nodes[n].set_weights(weights)
Then you just duplicate the models/layers within a graph you'd like to share. E.g. adapting the example above:
graph = Graph()
graph.add_input(name='input1', ndim=2)
graph.add_input(name='input2', ndim=2)
graph.add_node(Dense(748,10), name='dense1', input='input1')
graph.add_node(Dense(748,10), name='dense2', input='input2')
graph.add_node(Dense(20,20), name='dense3', inputs=['dense1','dense2'],merge_mode='concat')
graph.add_output(name='output', input='dense3')
graph.compile('rmsprop', 'output':'mse')
graph.fit(..., callbacks=[WeightSharing(['dense1','dense2'])]
This works but doesn't feel like a great solution. It's also quite slow - around 5x slower per epoch in my experiments (w/ GPU). I'd be interested to know if there's a better way to support this.
The params of any practical net will add up to be a large chunk of data. And moving that around will obviously slow down overall training.
A faster solution could be to change the Layer
API itself. But that is again not a great solution
A more efficient approach which I think is roughly equivalent is to average the gradients.
I.e. i've subclassed an optimizer and overridden get_gradients() - setting the dimensions corresponding to expressions for the shared models to be the mean over the all the shared model grad expressions. Assuming the weights are initialised the same, updates will be the same and weights are tied. I just haven't thought too deeply about how this might impact optimisation.
It is of my opinion that this is something that needs deeper thought and more attention.
Having the ability to share the weight pointers across layers (and across other keras models) will unlock, among a few things:
and opens up flexibility without ugliness, which I presume we both like (Torch and Keras).
@soumith, as I have stated above, I believe this would need significant changes throughout the library. It will be a major undertaking.
@pranv cool, did not know that, I am just a spectator offering opinions (because they are free ha!), the development, decisions and priorities are better taken by members like you and @fchollet ...
and opens up flexibility without ugliness, which I presume we both like (Torch and Keras).
I agree that it would be very useful (in fact necessary) for certain types of models. It should definitely be supported in the future.
The reason why that's not already the case is that there are fundamental (i.e. induced by Theano) issues with reusing layers with different inputs. It's not clear if we can achieve it without explicit copying/continuous synchronizing of the weights. But we will be looking into it.
@soumith but, this is a really important thing as you have stated. Thanks for bringing it up here!
@fchollet makes the best and final decisions. Eager to see his view.
Can this be solved by having different batch size to different layers? And having merge and split work on the batch size as well?
Then we could feed a layer by 2n samples... then use split to create two batches (each with different n items) and each one will then be processed by other layers...
I have a rather quick-and-dirty solution that at least SEEMS to work... Defining a new Dense layer with:
def get_output(self, train=False):
X = self.get_input(train)
X0 = X[:,0:self.dim]
X1 = X[:,self.dim:]
output0 = self.activation(T.dot(X0, self.W))
output1 = self.activation(T.dot(X1, self.W))
ret = T.concatenate([output0, output1], axis = 1)
return ret
I concatenate both input vectors in one matrix, split it in two here and multiply them with the shared weight matrix. self.dim defines the splitting point. Is this correct / rational? I'm rather new in Keras and Theano but I really love it and see great potentials here.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed after 30 days if no further activity occurs, but feel free to re-open a closed issue if needed.
Hi,
Related to #277, I'd like to create an overlapping model but with the input comes from different place. Is it supported? How can I do this? The following code gives me an error because of the unexpected input.
TIA