Result:
number of examples = 814484
weighted example sum = 814484
weighted label sum = 0
average loss = 0.0286955
best constant = 0
total feature number = 261449344
Step 2: Now use the model to predict on the exact same training data. I would expect the average loss to more or less similar to the average loss obtained during training (as the average loss during training indicates the model more or less memorized the training examples)
vw -t eng_train_2.vw -i test_4.model
Result
number of examples = 203621
weighted example sum = 203621
weighted label sum = 0
average loss = 0.144631 << This is not close to earlier loss reported during training
best constant = -4.91111e-06
total feature number = 65362336
Please could this be clarified. Is this an expected behavior or a bug.
Brief description: Step 1: Train a model on some training data and note the average loss:
Result: number of examples = 814484 weighted example sum = 814484 weighted label sum = 0 average loss = 0.0286955 best constant = 0 total feature number = 261449344
Step 2: Now use the model to predict on the exact same training data. I would expect the average loss to more or less similar to the average loss obtained during training (as the average loss during training indicates the model more or less memorized the training examples)
Result number of examples = 203621 weighted example sum = 203621 weighted label sum = 0 average loss = 0.144631 << This is not close to earlier loss reported during training best constant = -4.91111e-06 total feature number = 65362336
Please could this be clarified. Is this an expected behavior or a bug.