sosata / CS120DataAnalysis

Mobile phone data analysis for detecting symptoms of depression and anxiety.
3 stars 3 forks source link

Not a issue #3

Open manojkumardas7 opened 4 years ago

manojkumardas7 commented 4 years ago

Hi sosata,

I was unable to find a contact mail id for you. Did not know best way to reach you, so create an issue.Very bad way to do things I guess.

Very good discussion below: https://github.com/dmlc/xgboost/issues/1746?fbclid=IwAR1V11ELF83aTWGWZ_u-O10VAEqaf-ToeLlQFn1smR1zXFV-0ANk7-2iaz8

I could not put my comments. You are right on how softmax is used. Also, to answer why in binary case 1/(1+ exp(value)). The answer is, in general, for classification, probability of a specific label is 1/(1 + exp(-value)). Therefore, in the case of binary scenario, you have two probability values(for each label). One is p = 1/(1+exp(-value)), and therefore the other will, 1-p = exp(-value)/(1+exp(-value)). So you can actually choose any of the two terms for a your label of interest. From the two terms, you can take the second one, and that can be redefined now as 1/(1+exp(value)). May be you already figure this out. Sorry my English is poor. Greetings from India. Best wishes. Please close issue, if possible add as comment to the thread, I am unable to add.

sosata commented 4 years ago

Hi manojkumardas7, thanks for your comment - it seems that "lock bot" has closed the issue and it is not possible to add a comment anymore. But I think you're right that 1-p=1/(1+exp(value)), however, this is the probability of not being in the class, while in the multi-class case, those are the probabilities of being in each class. So the math is fine, but there seems to be inconsistency in terms of defining the probabilities. Cheers.

manojkumardas7 commented 4 years ago

Theoretically, you can choose (p), to be either of the two terms for your class of interest, in the case of binary classification. There is no hard and fast rule that the class of interest has to be strictly to the first of the second term. (p could be wither). The definition is purely by your choice. The model adjusts everything accordingly. You cannot change the definition once you have gone with one, once you began training the model. In the case of the multi-case, 1/(1+exp(-value)). So hard fro me to articulate, but I hope you got the point I am trying to make. Thanks for responding. Cheers to you :)