Closed fluency03 closed 8 years ago
The typical use case of TimeDistributedDense is for processing the output of an Embedding layer or a recurrent layer with return_sequences=True. Then you can transform the hidden representation at each timestep before applying further processing (like pooling or another recurrent layer).
I got an example from here. I am wondering what is the difference if I change the following TimeDistributedDense
into Dense
model = Sequential()
model.add(LSTM(hidden_neurons, input_dim=in_out_neurons, return_sequences=True))
model.add(Dropout(0.2))
model.add(TimeDistributedDense(in_out_neurons))
model.add(Activation("linear"))
It won't compile, because the dimensions don't match up. Dense expects a 2-dimensional input (batch_size, features), whereas the output of LSTM with return_sequences is 3 dimensional (batch_size, timesteps, features).
I did not mean purely changing the term from TimeDistributedDense
to Dense
. Can we just change the layer from TimeDistributedDense
to Dense
with changing the dimension as well? What could be the real difference in structure and in results?
that is
Dense
layer deal with 2D tensor, and output a 2D tensor
TimeDistributedDense
layer deal with 3D tensor, and output a 3D tensor.
The inner operation is the same y=f(Wx+b)
where f(x)
is activation function, W
and b
are weight and bias. While in TimeDistributedDense
, the operation is applied to every timestep.
every timestep, do you mean every unfolded unit of the recurrent layer?
Or, actually, each time step means each char
on one sentence
? Then, what are the pros and cons of TimeDistributedDense
. Does it increase a lot of computation time?
Yes, timestep mean every unfolded unit of RNN.
for e.g.
sequence = [A,B,C,D,E]
so,
A is in time=1
B is in time=2
...
E is in time=5
If you are going to apply y=f(Wx+b)
in each timestep of your input (i.e. your input should be in 3D tensor shape), TimeDistributedDense
is your only choice, and there is no pros and cons of it.
I thought y=f(Wx+b)
is done in each timestep even using Dense
since they are fully connected. So Dense
actually only apply activation function to the last time step?
Can I say that: for Dense
, it is used in Many-to-One
or One-to-One
cases. And TimeDistributedDense
is used in Many-to-Many
and One-to-Many
cases?
I think you should take a look at the Keras documentation carefully, and perhaps also theano documentation.
Because there is a big difference of Dense
and TimeDistribudedDense
.
Dense
only receives 2D tensor, which means that there is NO time dimension. i.e. an 2D -> 2D conversion.
TimeDistributedDense
only receives 3D tensor, which includes time dimension. i.e. an 3D -> 3D conversion.
Q: So Dense actually only apply activation function to the last time step?
A: No, there is no time dimension in Dense
layer
Q: for Dense , it is used in Many-to-One or One-to-One cases
A: it is one-to-one
Q: And TimeDistributedDense is used in Many-to-Many and One-to-Many cases?
A: it is many-to-many
So, the lstm_text_generation example is actually a one-to-one
case.
print('Build model...')
model = Sequential()
model.add(LSTM(512, return_sequences=True, input_shape=(maxlen, len(chars))))
model.add(Dropout(0.2))
model.add(LSTM(512, return_sequences=False))
model.add(Dropout(0.2))
model.add(Dense(len(chars)))
model.add(Activation('softmax'))
if you say the Dense
layer, that is one-to-one
case, as the previous layer LSTM
will return a 2D tensor type, which is the final state of LSTM. And the Dense
layer will output a 2D tensor, which is a probability distribution (softmax
) of whole vocabulary.
Thanks a lot. @ymcui I am wondering could you please take a look at my another post here Some interesting results of using this lstm_text_generation example. Need reasonable explanations.. That will be very helpful.
Hi, I want to train simple neural network with a data of shape (11,501,40) I set the input_shape of dense layer also (11,501,40) but is is not working kindly guide me . The code and error is given below from keras.models import Sequential from keras.layers import Dense import numpy as np
path="D:/DECASE2017/CNN/all_data_partial.npy" data=np.load(open(path,'rb'))
X=np.array((data[:11,:501,:40]))# all channels,all rows and 40 columns Y=np.array((data[:11,:501,40]))# all channels, all rows and only one column no. 40 (e.g class_label) model = Sequential() model.add(Dense(12,input_shape=(11,501,40),init='uniform', activation='relu')) model.add(Dense(8, init='uniform', activation='relu')) model.add(Dense(1, init='uniform', activation='sigmoid'))
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
model.fit(X,Y,nb_epoch=1, batch_size=10)
score = model.evaluate(X,Y) print("%s: %.2f%%" %(model.metrics_names[1], score[1]*100))
ValueError: Error when checking input: expected dense_80_input to have 4 dimensions, but got array with shape (11, 501, 40)
Thanking you
@fluency03 In your model, why do you add an activation layer with linear activations at the end, i.e. model.add(Activation("linear"))
? Does it have any effect?
For a regression type problem what dimension should i used to run my code and from keras.models import Sequential from keras.layers import Dense import numpy as np import tensorflow as tf from matplotlib import pyplot from sklearn.datasets import make_regression from sklearn.preprocessing import MinMaxScaler from sklearn.metrics import mean_squared_error from keras.wrappers.scikit_learn import KerasRegressor from sklearn.preprocessing import StandardScaler from keras.layers import Dense, Dropout, Flatten from keras.layers import Conv2D, MaxPooling2D from keras.optimizers import SGD
seed = 7
np.random.seed(seed)
from scipy.io import loadmat
dataset = loadmat('matlab2.mat')
Bx=basantix[:, 50001:99999]
Bx=np.transpose(Bx)
Fx=fx[:, 50001:99999]
Fx=np.transpose(Fx)
from sklearn.cross_validation import train_test_split Bx_train, Bx_test, Fx_train, Fx_test = train_test_split(Bx, Fx, test_size=0.2, random_state=0)
scaler = StandardScaler() # Class is create as Scaler scaler.fit(Bx_train) # Then object is created or to fit the data into it Bx_train = scaler.transform(Bx_train) Bx_test = scaler.transform(Bx_test) model = Sequential() def base_model():
keras.layers.Dense(Dense(49999, input_shape=(20,), activation='relu'))
model.add(Dense(20))
model.add(Dense(49998, init='normal', activation='relu'))
model.add(Dense(49998, init='normal'))
model.compile(loss='mean_squared_error', optimizer = 'adam')
return model
scale = StandardScaler() Bx = scale.fit_transform(Bx) Bx = scale.fit_transform(Bx)
clf = KerasRegressor(build_fn=base_model, nb_epoch=100, batch_size=5,verbose=0)
clf.fit(Bx,Fx) res = clf.predict(Bx)
clf.score(Fx,res)
kindly provide exact solution
@around1991 "It won't compile... Dense expects a 2-dimensional input..." This snippet compiles fine:
from keras.layers import TimeDistributed, Dense, Input, Conv1D, MaxPooling1D, Flatten
from keras.models import Model
inputs = Input(shape=(10, 30))
x = Dense(20)(inputs)
x = Conv1D(40, 5)(x)
x = MaxPooling1D(5)(x)
x = Flatten()(x)
x = Dense(3)(x)
model = Model(inputs, x)
print(model.summary())
model.compile(loss='categorical_crossentropy',
optimizer='rmsprop',
metrics=['acc'])
and here is the model summary:
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_1 (InputLayer) (None, 10, 30) 0
_________________________________________________________________
dense_1 (Dense) (None, 10, 20) 620
_________________________________________________________________
conv1d_1 (Conv1D) (None, 6, 40) 4040
_________________________________________________________________
max_pooling1d_1 (MaxPooling1 (None, 1, 40) 0
_________________________________________________________________
flatten_1 (Flatten) (None, 40) 0
_________________________________________________________________
dense_2 (Dense) (None, 3) 123
=================================================================
Total params: 4,783
Trainable params: 4,783
Non-trainable params: 0
_________________________________________________________________
This version uses the TimeDistributed and looks like it has the same model summary:
from keras.layers import TimeDistributed, Dense, Input, Conv1D, MaxPooling1D, Flatten
from keras.models import Model
inputs = Input(shape=(10, 30))
x = TimeDistributed(Dense(20))(inputs)
x = Conv1D(40, 5)(x)
x = MaxPooling1D(5)(x)
x = Flatten()(x)
x = Dense(3)(x)
model = Model(inputs, x)
print(model.summary())
model.compile(loss='categorical_crossentropy',
optimizer='rmsprop',
metrics=['acc'])
Here is the summary:
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_1 (InputLayer) (None, 10, 30) 0
_________________________________________________________________
time_distributed_1 (TimeDist (None, 10, 20) 620
_________________________________________________________________
conv1d_1 (Conv1D) (None, 6, 40) 4040
_________________________________________________________________
max_pooling1d_1 (MaxPooling1 (None, 1, 40) 0
_________________________________________________________________
flatten_1 (Flatten) (None, 40) 0
_________________________________________________________________
dense_2 (Dense) (None, 3) 123
=================================================================
Total params: 4,783
Trainable params: 4,783
Non-trainable params: 0
_________________________________________________________________
@fluency03 Did you figure out the answer to your question, I am still confused with the above example! Dense
does accept 3D input, It is simply a matrix multiplication (and adding a bias term) nothing wrong with (?, 10, 30) x (30, 20) ---> (?, 10, 20) (matrix is 30x20=600 params) This matrix multiplication is nothing but applying a fully connected (30x20) layer to each of the 10 30-dimensional vectors of the input, which seems to be the same as what TimeDistributed
does!
@rmanak I think although Dense
layer accepts 3D input, it flattens the first two dimensions, but TimeDistributed Dense
won't flatten the first 2 dimensions (batch_size, time_steps)
, thus the temporal information is reserved and not mixed.
I am still confused about the difference between
Dense
andTimeDistributedDense
even though there are already some similar questions asked here and here. People are discussing a lot but no common-agreed conclusions.And even though, here, @fchollet stated that:
I still need detailed illustration about what exactly the difference between them.