keras-team / keras

Deep Learning for humans
http://keras.io/
Apache License 2.0
62.16k stars 19.49k forks source link

I am still confused about the difference between `Dense` and `TimeDistributedDense` #2038

Closed fluency03 closed 8 years ago

fluency03 commented 8 years ago

I am still confused about the difference between Dense and TimeDistributedDense even though there are already some similar questions asked here and here. People are discussing a lot but no common-agreed conclusions.

And even though, here, @fchollet stated that:

TimeDistributedDense applies a same Dense (fully-connected) operation to every timestep of a 3D tensor.

I still need detailed illustration about what exactly the difference between them.

around1991 commented 8 years ago

The typical use case of TimeDistributedDense is for processing the output of an Embedding layer or a recurrent layer with return_sequences=True. Then you can transform the hidden representation at each timestep before applying further processing (like pooling or another recurrent layer).

fluency03 commented 8 years ago

I got an example from here. I am wondering what is the difference if I change the following TimeDistributedDense into Dense

model = Sequential()
model.add(LSTM(hidden_neurons, input_dim=in_out_neurons, return_sequences=True))
model.add(Dropout(0.2))
model.add(TimeDistributedDense(in_out_neurons))  
model.add(Activation("linear")) 
around1991 commented 8 years ago

It won't compile, because the dimensions don't match up. Dense expects a 2-dimensional input (batch_size, features), whereas the output of LSTM with return_sequences is 3 dimensional (batch_size, timesteps, features).

fluency03 commented 8 years ago

I did not mean purely changing the term from TimeDistributedDense to Dense. Can we just change the layer from TimeDistributedDense to Dense with changing the dimension as well? What could be the real difference in structure and in results?

ymcui commented 8 years ago

that is Dense layer deal with 2D tensor, and output a 2D tensor TimeDistributedDense layer deal with 3D tensor, and output a 3D tensor. The inner operation is the same y=f(Wx+b) where f(x) is activation function, W and b are weight and bias. While in TimeDistributedDense, the operation is applied to every timestep.

fluency03 commented 8 years ago

every timestep, do you mean every unfolded unit of the recurrent layer?

fluency03 commented 8 years ago

Or, actually, each time step means each char on one sentence? Then, what are the pros and cons of TimeDistributedDense. Does it increase a lot of computation time?

ymcui commented 8 years ago

Yes, timestep mean every unfolded unit of RNN. for e.g. sequence = [A,B,C,D,E] so, A is in time=1 B is in time=2 ... E is in time=5 If you are going to apply y=f(Wx+b) in each timestep of your input (i.e. your input should be in 3D tensor shape), TimeDistributedDense is your only choice, and there is no pros and cons of it.

fluency03 commented 8 years ago

I thought y=f(Wx+b) is done in each timestep even using Dense since they are fully connected. So Dense actually only apply activation function to the last time step?

Can I say that: for Dense , it is used in Many-to-One or One-to-One cases. And TimeDistributedDense is used in Many-to-Many and One-to-Many cases?

ymcui commented 8 years ago

I think you should take a look at the Keras documentation carefully, and perhaps also theano documentation. Because there is a big difference of Dense and TimeDistribudedDense. Dense only receives 2D tensor, which means that there is NO time dimension. i.e. an 2D -> 2D conversion. TimeDistributedDense only receives 3D tensor, which includes time dimension. i.e. an 3D -> 3D conversion.

Q: So Dense actually only apply activation function to the last time step? A: No, there is no time dimension in Dense layer Q: for Dense , it is used in Many-to-One or One-to-One cases A: it is one-to-one Q: And TimeDistributedDense is used in Many-to-Many and One-to-Many cases? A: it is many-to-many

fluency03 commented 8 years ago

So, the lstm_text_generation example is actually a one-to-one case.

ymcui commented 8 years ago
print('Build model...')
model = Sequential()
model.add(LSTM(512, return_sequences=True, input_shape=(maxlen, len(chars))))
model.add(Dropout(0.2))
model.add(LSTM(512, return_sequences=False))
model.add(Dropout(0.2))
model.add(Dense(len(chars)))
model.add(Activation('softmax'))

if you say the Dense layer, that is one-to-one case, as the previous layer LSTM will return a 2D tensor type, which is the final state of LSTM. And the Dense layer will output a 2D tensor, which is a probability distribution (softmax) of whole vocabulary.

fluency03 commented 8 years ago

Thanks a lot. @ymcui I am wondering could you please take a look at my another post here Some interesting results of using this lstm_text_generation example. Need reasonable explanations.. That will be very helpful.

Khalidhussain1134 commented 7 years ago

Hi, I want to train simple neural network with a data of shape (11,501,40) I set the input_shape of dense layer also (11,501,40) but is is not working kindly guide me . The code and error is given below from keras.models import Sequential from keras.layers import Dense import numpy as np

path="D:/DECASE2017/CNN/all_data_partial.npy" data=np.load(open(path,'rb'))

X=np.array((data[:11,:501,:40]))# all channels,all rows and 40 columns Y=np.array((data[:11,:501,40]))# all channels, all rows and only one column no. 40 (e.g class_label) model = Sequential() model.add(Dense(12,input_shape=(11,501,40),init='uniform', activation='relu')) model.add(Dense(8, init='uniform', activation='relu')) model.add(Dense(1, init='uniform', activation='sigmoid'))

compile a model

model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])

fit the model

model.fit(X,Y,nb_epoch=1, batch_size=10)

evaluate

score = model.evaluate(X,Y) print("%s: %.2f%%" %(model.metrics_names[1], score[1]*100))

ValueError: Error when checking input: expected dense_80_input to have 4 dimensions, but got array with shape (11, 501, 40)

Thanking you

krikru commented 7 years ago

@fluency03 In your model, why do you add an activation layer with linear activations at the end, i.e. model.add(Activation("linear"))? Does it have any effect?

prateekbhadauria commented 6 years ago

For a regression type problem what dimension should i used to run my code and from keras.models import Sequential from keras.layers import Dense import numpy as np import tensorflow as tf from matplotlib import pyplot from sklearn.datasets import make_regression from sklearn.preprocessing import MinMaxScaler from sklearn.metrics import mean_squared_error from keras.wrappers.scikit_learn import KerasRegressor from sklearn.preprocessing import StandardScaler from keras.layers import Dense, Dropout, Flatten from keras.layers import Conv2D, MaxPooling2D from keras.optimizers import SGD

seed = 7 np.random.seed(seed) from scipy.io import loadmat dataset = loadmat('matlab2.mat') Bx=basantix[:, 50001:99999]
Bx=np.transpose(Bx)
Fx=fx[:, 50001:99999] Fx=np.transpose(Fx)

from sklearn.cross_validation import train_test_split Bx_train, Bx_test, Fx_train, Fx_test = train_test_split(Bx, Fx, test_size=0.2, random_state=0)

scaler = StandardScaler() # Class is create as Scaler scaler.fit(Bx_train) # Then object is created or to fit the data into it Bx_train = scaler.transform(Bx_train) Bx_test = scaler.transform(Bx_test) model = Sequential() def base_model():

 keras.layers.Dense(Dense(49999, input_shape=(20,), activation='relu'))
 model.add(Dense(20))
 model.add(Dense(49998, init='normal', activation='relu'))
 model.add(Dense(49998, init='normal'))
 model.compile(loss='mean_squared_error', optimizer = 'adam')
 return model

scale = StandardScaler() Bx = scale.fit_transform(Bx) Bx = scale.fit_transform(Bx)

clf = KerasRegressor(build_fn=base_model, nb_epoch=100, batch_size=5,verbose=0)

clf.fit(Bx,Fx) res = clf.predict(Bx)

line below throws an error

clf.score(Fx,res)

kindly provide exact solution

rmanak commented 6 years ago

@around1991 "It won't compile... Dense expects a 2-dimensional input..." This snippet compiles fine:

from keras.layers import TimeDistributed, Dense, Input, Conv1D, MaxPooling1D, Flatten
from keras.models import Model

inputs = Input(shape=(10, 30))
x = Dense(20)(inputs)
x = Conv1D(40, 5)(x)
x = MaxPooling1D(5)(x)
x = Flatten()(x)
x = Dense(3)(x)
model = Model(inputs, x)

print(model.summary())
model.compile(loss='categorical_crossentropy',
                            optimizer='rmsprop',
                            metrics=['acc'])

and here is the model summary:

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
input_1 (InputLayer)         (None, 10, 30)            0         
_________________________________________________________________
dense_1 (Dense)              (None, 10, 20)            620       
_________________________________________________________________
conv1d_1 (Conv1D)            (None, 6, 40)             4040      
_________________________________________________________________
max_pooling1d_1 (MaxPooling1 (None, 1, 40)             0         
_________________________________________________________________
flatten_1 (Flatten)          (None, 40)                0         
_________________________________________________________________
dense_2 (Dense)              (None, 3)                 123       
=================================================================
Total params: 4,783
Trainable params: 4,783
Non-trainable params: 0
_________________________________________________________________

This version uses the TimeDistributed and looks like it has the same model summary:

from keras.layers import TimeDistributed, Dense, Input, Conv1D, MaxPooling1D, Flatten
from keras.models import Model

inputs = Input(shape=(10, 30))
x = TimeDistributed(Dense(20))(inputs)
x = Conv1D(40, 5)(x)
x = MaxPooling1D(5)(x)
x = Flatten()(x)
x = Dense(3)(x)
model = Model(inputs, x)

print(model.summary())
model.compile(loss='categorical_crossentropy',
                            optimizer='rmsprop',
                            metrics=['acc'])

Here is the summary:

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
input_1 (InputLayer)         (None, 10, 30)            0         
_________________________________________________________________
time_distributed_1 (TimeDist (None, 10, 20)            620       
_________________________________________________________________
conv1d_1 (Conv1D)            (None, 6, 40)             4040      
_________________________________________________________________
max_pooling1d_1 (MaxPooling1 (None, 1, 40)             0         
_________________________________________________________________
flatten_1 (Flatten)          (None, 40)                0         
_________________________________________________________________
dense_2 (Dense)              (None, 3)                 123       
=================================================================
Total params: 4,783
Trainable params: 4,783
Non-trainable params: 0
_________________________________________________________________

@fluency03 Did you figure out the answer to your question, I am still confused with the above example! Dense does accept 3D input, It is simply a matrix multiplication (and adding a bias term) nothing wrong with (?, 10, 30) x (30, 20) ---> (?, 10, 20) (matrix is 30x20=600 params) This matrix multiplication is nothing but applying a fully connected (30x20) layer to each of the 10 30-dimensional vectors of the input, which seems to be the same as what TimeDistributed does!

chopwoodwater commented 4 years ago

@rmanak I think although Dense layer accepts 3D input, it flattens the first two dimensions, but TimeDistributed Dense won't flatten the first 2 dimensions (batch_size, time_steps), thus the temporal information is reserved and not mixed.