Extremely low probability on seemingly accurate predictions.

I have a small set of tagged data (financial securities offerings) with a very straightforward structure and pattern.

I have been playing around with the 'algorithm' parameter of the Trainer Class.

If I use 'ap' (averaged perceptron) and train a small model for the use case mentioned above and try to predict on a new set of input I am getting a model probability over 0.5 in most cases. Although upon looking closely at the predictions, there are a few mistakes clearly.
But when I use 'arow' algorithm and train my model, and try to predict on the same input set I get very low probability generally in the order of (10^-6) for an 100% accurate prediction.

Below is a list of labels predicted by the ap and arow models respectively. Also, the actual labels are specified below.

Actual Labels for an input (can't share input here, security issues)

['NN', 'coupon', 'maturity_date', 'maturity_date', 'par_amount', 'CD', 'rate', 'oas', 'NN', '.']

Labels Predicted by Model trained using AROW Algorithm. (Extremely low Probability)

[['NN', 'coupon', 'maturity_date', 'maturity_date', 'par_amount', 'CD', 'rate', 'oas', 'NN', '.']] Model Probabilty: 2.89272422392235e-08

On the other hand

Labels Predicted by Model trained using AP Algorithm.

[['NN', 'coupon', 'maturity_date', 'par_amount', 'par_amount', 'par_amount', 'rate', 'oas', 'NN', '.']] Model Probability: 0.8457047663567325

I have trained the models with the same parameters, the same number of max_iterations, same data, and all other hyperparameters are also the same.

Am I missing something here? Why is it so low on the AROW model where the prediction is 100% accurate. I am using the model probability data to further filter the processed records for a more funneled consumption.

Thanks, R

scrapinghub / python-crfsuite

Extremely low probability on seemingly accurate predictions. #122