inspirehep / magpie

Deep neural network framework for multi-label text classification
MIT License
684 stars 192 forks source link

Magpie used in binary classification..... #157

Closed JessicaKuo closed 6 years ago

JessicaKuo commented 6 years ago

Hello,

I have used magpie for multi-label text classification before and found that it's a powerful tool.

Recently, I tried to use magpie to run a binary text classification. And I have 217 cases totally and split them into 4:1 for training and testing in this research. But I got the output result like this:

answerICD= tumor positive,, 2

Predict: 0 tumor positive

tumor positive 0.51141936

tumor negative 0.5093596 answerICD= tumor positive,, 2

Predict: 0 tumor positive

tumor positive 0.51141936

tumor negative 0.5093596 answerICD= tumor negative, 1

tumor positive 0.51141936

tumor negative 0.5093596 answerICD= tumor negative, 1

tumor positive 0.51141936

tumor negative 0.5093596 As you can see , it output the same probabilities of these two labels in each testing case...it's result is pretty strange, so I want to ask is there any suggestion or explanation of this output result? Thanks for your patient looking!

dorg-ekrolewicz commented 6 years ago

Can you describe a bit more about how you generated your test/label cases?

On Tue, Sep 25, 2018 at 9:58 AM Jessica10105009 notifications@github.com wrote:

Hello,

I have used magpie for multi-label text classification before and found that it's a powerful tool.

Recently, I tried to use magpie to run a binary text classification. And I have 217 cases totally and split them into 4:1 for training and testing in this research. But I got the output result like this: answerICD= tumor positive,, 2

Predict: 0 tumor positive

tumor positive 0.51141936

tumor negative 0.5093596 answerICD= tumor positive,, 2

Predict: 0 tumor positive

tumor positive 0.51141936

tumor negative 0.5093596 answerICD= tumor negative, 1

tumor positive 0.51141936

tumor negative 0.5093596 answerICD= tumor negative, 1

tumor positive 0.51141936

tumor negative 0.5093596

As you can see , it output the same probabilities of these two labels in each testing case...it's result is pretty strange, so I want to ask is there any suggestion or explanation of this output result?

Thanks for your patient looking!

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/inspirehep/magpie/issues/157, or mute the thread https://github.com/notifications/unsubscribe-auth/AkdXUnoMdyjdZQ6EIiN0A8QTaUXrq0tPks5uemC-gaJpZM4W5DpN .

--

Edan Krolewicz

Edan Krolewicz

*Research Automation, *DiscoverOrg

P: +1 360.783.6842 |

edan.krolewicz@discoverorg.com

JessicaKuo commented 6 years ago

Thanks for your reply. I found that I didn't do the preprocessing procedure of line break problem so it only read the first line of text (all the same content) so it all output the same probability. It has been solved now. Sorry for inconvenience and thanks for your kind reply.

kaundinya5 commented 6 years ago

This is happening to me as well, I'm trying to classify policy numbers and account numbers, both of which are alphanumeric. I trained the model and I'm always getting the same probabilities! Since the .txt files contain just 1 word, I changed the minimum number of words in word2vec to 1. Am I doing something wrong?

jstypka commented 6 years ago

as mentioned in #158, Magpie is not of much help if your document contains only one word.