Closed prashant45 closed 5 years ago
Thank you for suggesting!
From the results of summary()
function keras provides, these two implementations are different.
The number of training parameters of suggested impl. is larger than the one of current impl.
(T*D+1)*F*C
params on mask inference head(D+1)*C*F
params on mask inference headAlthough I have measured the model size neither on memory nor on disk, if a model has more training parameters, more expressive powers come with more memory consumption. So, I am skeptical about suggested implementation requires less memory.
I can't say anything on correctness of the model as I think there are only 'good' model.
If you want to implement Chimera++ you refer, mask_linear
could be on the body_blstm_n
.
Since body_linear
is actually an embedding layer and the paper claims that the motivation
for setting a mask inference layer on an embedding layer is unclear.
The followings are results from summary()
function of two implementations
with T, F, C, D = 5, 4, 2, 3
and less, small BLSTM layers.
__________________________________________________________________________________________________
Layer (type) Output Shape Param # Connected to
==================================================================================================
input (InputLayer) (None, 5, 4) 0
__________________________________________________________________________________________________
body_blstm_1 (Bidirectional) (None, 5, 20) 1200 input[0][0]
__________________________________________________________________________________________________
body_blstm_2 (Bidirectional) (None, 5, 20) 2480 body_blstm_1[0][0]
__________________________________________________________________________________________________
body_linear (Dense) (None, 5, 12) 252 body_blstm_2[0][0]
__________________________________________________________________________________________________
body (Reshape) (None, 5, 4, 3) 0 body_linear[0][0]
__________________________________________________________________________________________________
embedding_activation (Activatio (None, 5, 4, 3) 0 body[0][0]
__________________________________________________________________________________________________
mask_linear (Dense) (None, 5, 8) 104 body_linear[0][0]
__________________________________________________________________________________________________
embedding (Lambda) (None, 5, 4, 3) 0 embedding_activation[0][0]
__________________________________________________________________________________________________
mask (Reshape) (None, 5, 4, 2) 0 mask_linear[0][0]
==================================================================================================
Total params: 4,036
Trainable params: 4,036
Non-trainable params: 0
__________________________________________________________________________________________________
__________________________________________________________________________________________________
Layer (type) Output Shape Param # Connected to
==================================================================================================
input (InputLayer) (None, 5, 4) 0
__________________________________________________________________________________________________
body_blstm_1 (Bidirectional) (None, 5, 20) 1200 input[0][0]
__________________________________________________________________________________________________
body_blstm_2 (Bidirectional) (None, 5, 20) 2480 body_blstm_1[0][0]
__________________________________________________________________________________________________
body_linear (Dense) (None, 5, 12) 252 body_blstm_2[0][0]
__________________________________________________________________________________________________
body (Reshape) (None, 5, 4, 3) 0 body_linear[0][0]
__________________________________________________________________________________________________
mask_slice_1 (Lambda) (None, 5, 3) 0 body[0][0]
__________________________________________________________________________________________________
mask_slice_2 (Lambda) (None, 5, 3) 0 body[0][0]
__________________________________________________________________________________________________
mask_slice_3 (Lambda) (None, 5, 3) 0 body[0][0]
__________________________________________________________________________________________________
mask_slice_4 (Lambda) (None, 5, 3) 0 body[0][0]
__________________________________________________________________________________________________
embedding_activation (Activatio (None, 5, 4, 3) 0 body[0][0]
__________________________________________________________________________________________________
mask_linear_1 (Dense) (None, 5, 2) 8 mask_slice_1[0][0]
__________________________________________________________________________________________________
mask_linear_2 (Dense) (None, 5, 2) 8 mask_slice_2[0][0]
__________________________________________________________________________________________________
mask_linear_3 (Dense) (None, 5, 2) 8 mask_slice_3[0][0]
__________________________________________________________________________________________________
mask_linear_4 (Dense) (None, 5, 2) 8 mask_slice_4[0][0]
__________________________________________________________________________________________________
embedding (Lambda) (None, 5, 4, 3) 0 embedding_activation[0][0]
__________________________________________________________________________________________________
mask (Lambda) (None, 5, 4, 2) 0 mask_linear_1[0][0]
mask_linear_2[0][0]
mask_linear_3[0][0]
mask_linear_4[0][0]
==================================================================================================
Total params: 3,964
Trainable params: 3,964
Non-trainable params: 0
__________________________________________________________________________________________________
Thank you for the detailed answer.
You are right about the number of training parameters for model.summary()
.
The reason for the memory problem I mentioned, was with respect to the list of layers mask_slice_n
(I think). I already tried running both architectures on a NVIDIA GeForce GTX 1080 Ti
with 12 GB
memory. When I use an input shape B, T, F = 32, 300, 129
, I get memory error for the current implementation and no memory error for the suggested implementation. Hence, I wonder if just a Reshape
is "correct" implementation for the original Chimera model.
I confirmed that suggested inplementation does not occur memory error while current one does, although the environment differs.
Speaking about the original Chimera implementation, I thought that F Dense(C, ...)
layers constitute mask inference layer, from third paragraph of section 2.2 on Chimera paper. So I implemented in split/merge way.
On the order of activation and normalization, I selected this order as in the second paragraph of section 2.2. Also, I thought that each row of V (TFxD embedding matrix) is a unit vector.
I'd like to suggest followings:
Thanks.
https://github.com/arity-r/ChimeraNet/blob/6341383c61f238a83a0be8c7d4972aac4e7d958a/chimeranet/model.py#L57-L69
Correct me if I am wrong. One can simply change this block of code with
Dense
, andReshape
layers like this:I think, the API can handle the gradients update accordingly because of the
Reshape
layer. Also, it does not require much memory because the masks are not extracted a list. But I wonder if it would be a a correct definition for the mask-inference head of the model.The reference I use is the Chimera++ network from the paper ALTERNATIVE OBJECTIVE FUNCTIONS FOR DEEP CLUSTERING which redefines the architecture for speaker separation.