zhengruifeng / spark-libFM

An implement of Factorization Machines (LibFM)
Apache License 2.0
248 stars 119 forks source link

Use 1/0 labels for binary classification instead of 1/-1 #9

Open benmccann opened 8 years ago

benmccann commented 8 years ago

The loss function used in this library for binary classification is a hinge-loss function assuming labels +1 or -1:

case 1 =>
  1 - Math.signum(pred * label)

However, the predictions being made are in the range 0-1:

case 1 =>
  1.0 / (1.0 + Math.exp(-pred))

The 1 / 0 used in predictions should be preferred to the 1 / -1 expected in the loss function because the negative label is represented by 0 in spark.mllib instead of −1, to be consistent with multiclass labeling.

The loss function should be changed to be more like the way Spark does it.

benmccann commented 8 years ago

Ahh, looks like it does a transform. But I think this is a very non-standard way of doing things since the goal is to upstream this and have it merged to Spark's mllib. I believe they use the 1 / 0 representation internally and we shouldn't change that.

val data = task match {
  case 0 =>
    input.map(l => (l.label, l.features)).persist()
  case 1 =>
    input.map(l => (if (l.label > 0) 1.0 else -1.0, l.features)).persist()
}
zdx commented 7 years ago

conclusion?

willysys commented 5 years ago

In classification problem,why compute gradient use logitloss, but get loss use hingeloss ? get gradient in code as follows:

val mult = task match {
      case 0 =>
        pred - label
      case 1 =>
        -label * (1.0 - 1.0 / (1.0 + Math.exp(-label * pred)))
    }

get loss in code as follows:

task match {
      case 0 =>
        (pred - label) * (pred - label)
      case 1 =>
        1 - Math.signum(pred * label)            //hinge loss
    }