How to interpret the output of testing with mode `probs`?

zhiguowang / BiMPM

BiMPM: Bilateral Multi-Perspective Matching for Natural Language Sentences

Apache License 2.0

438 stars 150 forks source link

How to interpret the output of testing with mode `probs`? #20

Closed uduse closed 5 years ago

uduse commented 7 years ago

1       1:0.38222232461 0:0.617777705193

0       1:0.669489085674 0:0.330510884523

0       1:0.894043326378 0:0.105956666172

1       1:0.770928144455 0:0.229071870446

0       1:0.691392481327 0:0.30860760808

My guess is that, the first column is the predicted label, the second column is label 1 with its probability, and third column is label 0 with its probability. However, if it's this way, then the first column should contain the label whichever has a larger probability in the second and the third column. It is not. The last line 0 1:0.691392481327 0:0.30860760808 contradicts my guess.

So how should I interpret this?

uduse commented 6 years ago

Ok, I figured this out by reading the source code. The first column is the original label, the others are probability prediction of all labels. The test set is shuffled before it is tested, so the order of result doesn't match the order of the input.

wanghm92 commented 6 years ago

Hi! I think according to https://github.com/zhiguowang/BiMPM/blob/7052c19acb83452ad077da14512bcac19a00c3d0/src/SentenceMatchTrainer.py#L175-L183 , the text and dev datastreams are not shuffled. Could you point out where the shuffling takes place? Thanks!

uduse commented 6 years ago

@wanghm92 SentenceMatchDataStream sorts input if the isSort flag is set, and for all I know isSort is always checked wherever SentenceMatchDataStream is used.

https://github.com/zhiguowang/BiMPM/blob/7052c19acb83452ad077da14512bcac19a00c3d0/src/SentenceMatchDataStream.py#L85-L87

This sorts the inputs based on their length for a purpose that I am not aware of. Maybe to utilize cache (inputs with the same length are alike), maybe.

wanghm92 commented 6 years ago

Oh, I overlooked the isSort flag. Thanks! Yes, sorting helps to group sentences with similar length into the same bucket to avoid unnecessary padding after short sentences to the length of a very long one. I guess sorting as a preprocess step + turning off the isSort flag for dev and test would be a nice way to preserve the order of testing instances.

uduse commented 6 years ago

@wanghm92 grouping similar sentences into the same bucket/batch sounds more reasonable than my cache thing.

The sorting sorts attributes all together, so info like the original sentences has the same order as well. What I did is to print out the entire thing (in additional to just probs) after the evaluation is done, and retrieve the original order by other fields.

uduse commented 5 years ago

Closed due to inactivity.