The comment is inconsistent with the actual values passed.
It is also worth noting that 1e-5 may be too small a fraction size to train over all parameters. Since the GradientDescent implementation in Scala performs numIterations iterations of mini batch SGD with batch size miniBatchFraction, it follows that approximately numIterations * miniBatchFraction labeled points are updated. For numIterations = 100 and miniBatchFraction = 1e-5, this means only a maximum of 1e-3 labeled points are actually used during training!
Further implications: since the model has a set of parameters per feature, this means that if a feature is unseen during training, then they will simply be initialized with their default values: latent vectors initialized from a Normal distribution and weights initialized to 0.0.
From the
FMWithSGD
file:The comment is inconsistent with the actual values passed.
It is also worth noting that
1e-5
may be too small a fraction size to train over all parameters. Since theGradientDescent
implementation in Scala performsnumIterations
iterations of mini batch SGD with batch sizeminiBatchFraction
, it follows that approximatelynumIterations * miniBatchFraction
labeled points are updated. FornumIterations = 100
andminiBatchFraction = 1e-5
, this means only a maximum of1e-3
labeled points are actually used during training!Further implications: since the model has a set of parameters per feature, this means that if a feature is unseen during training, then they will simply be initialized with their default values: latent vectors initialized from a Normal distribution and weights initialized to
0.0
.