zhengruifeng / spark-libFM

An implement of Factorization Machines (LibFM)
Apache License 2.0
248 stars 119 forks source link

`FMWithSGD` default constructor parameters are inconsistent/too small #11

Open Hydrotoast opened 8 years ago

Hydrotoast commented 8 years ago

From the FMWithSGD file:

  /**
    * Construct an object with default parameters: {task: 0, stepSize: 1.0, numIterations: 100,
    * dim: (true, true, 8), regParam: (0, 0.01, 0.01), miniBatchFraction: 1.0}.
    */
  def this() = this(0, 1.0, 100, (true, true, 8), (0, 1e-3, 1e-4), 1e-5)

The comment is inconsistent with the actual values passed.

It is also worth noting that 1e-5 may be too small a fraction size to train over all parameters. Since the GradientDescent implementation in Scala performs numIterations iterations of mini batch SGD with batch size miniBatchFraction, it follows that approximately numIterations * miniBatchFraction labeled points are updated. For numIterations = 100 and miniBatchFraction = 1e-5, this means only a maximum of 1e-3 labeled points are actually used during training!

Further implications: since the model has a set of parameters per feature, this means that if a feature is unseen during training, then they will simply be initialized with their default values: latent vectors initialized from a Normal distribution and weights initialized to 0.0.