Question regarding the output layer

idiap / gile

A generalized input-label embedding for text classification

GNU General Public License v3.0

23 stars 6 forks source link

Question regarding the output layer #2

Closed YipingNUS closed 4 years ago

YipingNUS commented 4 years ago

Hi @nik0spapp , I was studying the code and the paper and I have a doubt regarding the output layer. In the paper it would correspond to queation 11-13. In the code, it's the following:

classjoint_sig =  Dense(1, input_dim=(V_doc._keras_shape[1]), activation='sigmoid')
doclab_sig = TimeDistributed(classjoint_sig, )      
decision = doclab_sig(matrix)

My question is why there's a TimeDistributed wrapper in the output layer? In the paper, it mentioned the output layer is a linear unit (eq11) and a sigmoid transformation (eq13). My direct translation from the equation in the paper would be:

classjoint_sig = Dense(1, activation='sigmoid', name='sig_output_layer')
decision = classjoint_sig(matrix)

nik0spapp commented 4 years ago

Hello @YipingNUS,

Indeed, your implementation is more straight-forward but essentially performs the same computation as far as I can tell. I recall that when I wrote this there was some issue with directly using the Dense layer and TimeDistributed wrapper was a good alternative at the time.

Hope that answers your question. In any case, please feel free to re-open the ticket if you have any more questions or require further clarification.

Best, Nikos

YipingNUS commented 4 years ago

I guess previously why it didn't work without TimeDistributed is because you set the input dimension to V_doc._keras_shape[1]. I'm not sure with the syntax of Keras, but tf.shape will include the batch_size in the first dimension. So the input size you specify might be num_labels instead of the interaction layer dimension. I tried my code snippet above without specifying the input dimension and it works.

Why adding TimeDistributed will work is you're treating each label as a time step. The computation might be the same but it's conceptually a bit hard to understand. Anyway, thanks for your prompt reply!

nik0spapp commented 4 years ago

You are right, it is counter-intuitive to use the TimeDistributed for this operation; I will update the code with your suggestion.

Thanks again for your feedback!