Questions about Multi-Label and output activation function...

JessicaKuo commented 6 years ago

Thanks a lot for offering such good tool for multi-label text classification. It's pretty helpful for my research. Because I am new to the field of neural network and multi-label, I can't understand so much in some places when using the model(I used CNN model and all parameter settings are default) and I knew the CNN model behind magpie referred from Kim, Yoon. "Convolutional neural networks for sentence classification."

How the CNN model deal with the issue of multi-label classification?Because I didn't found out any description about multi-label classification in Kim, Yoon paper .....I am not sure whether I missed something....

2.What is the output activation function used in CNN model in magpie? Originally I think is softmax because the output is probability scores and the softmax output is used by Kim, Yoon. "Convolutional neural networks for sentence classification." But I saw the codes in models.py in magpie: outputs = Dense(output_length, activation='sigmoid')(flattened) model.compile( loss='binary_crossentropy', optimizer='adam', metrics=['top_k_categorical_accuracy'], ) And I also read some related topic articles saying that softmax with crossentropy is appropriate for multi-class classification(But if add the threshold it's also can be multi-label) and sigmoid with binary_crossentropy is suitable for multi-label classification

So it makes me confused that the output activation function used in magpie is use softmax or sigmoid?

Thanks for your patient looking!

jstypka commented 6 years ago

The network described in the paper works fine for multi-label classification. There is no softmax layer at the end of the network, so we can treat the labels independently.
There is no softmax function at the end, it is simply a sigmoid activation function as you noticed. Softmax function guarantees that all label probabilities will sum up to one, which does not make sense for multi-label classification i.e. two labels should be allowed to have probabilities >0.5.

Hope that helps @JessicaKuo !

JessicaKuo commented 6 years ago

Ok, now I understand. Thanks for your explanation!

prateekjoshi565 commented 6 years ago

Hi @JessicaKuo

May I know which dataset you are using for multi-label classification?

JessicaKuo commented 6 years ago

@prateekjoshi565
The dataset I used is SOAP ,prescription and diseases information of outpatients from one hospital.

inspirehep / magpie

Questions about Multi-Label and output activation function... #141