Prob. Values are not consistent if the padding amount changes

zhiguowang / BiMPM

BiMPM: Bilateral Multi-Perspective Matching for Natural Language Sentences

Apache License 2.0

438 stars 150 forks source link

Prob. Values are not consistent if the padding amount changes #32

Open agupta74 opened 6 years ago

agupta74 commented 6 years ago

The final softmax prob. values are not same if the padding amount changes. It looks like that for some of the functions such as reduce_mean and reduce_max the padding is not masked out. Also the word/char embeddings for zero padded tokens is not zero.

zhiguowang commented 6 years ago

Not sure which part are you talking about. But I always multiply a mask to filter out the influence of padding values.

agupta74 commented 6 years ago

In matchutils.py, tf.reduce_max and tf.reduce_mean (e.g Lines 150 and 151 -- there are other places in matchutils.py using these functions) are computed without masking. So if you vary the zero padding for the same sentence/passage or query, the final prob. values are different. The amount of zero padding should not impact the final softmax prob. values for a given query/passage.

agupta74 commented 6 years ago

With regard to the word/char embeddings for "zero" word/char tokens in the zero-padded query/passage, should these zero word embeddings not be a zero vector ? Any non-zero values for zero padded tokens seems to impact the intermediate cosine similarity score (and therefore final softmax score) when I change the zero padding amount.

zhiguowang commented 6 years ago

I multiplied mask at this line https://github.com/zhiguowang/BiMPM/blob/master/src/match_utils.py#L145 before line 150 and line 151.

Before any matching layer, I will also multiply a mask to filter out padding values https://github.com/zhiguowang/BiMPM/blob/master/src/match_utils.py#L139

agupta74 commented 6 years ago

I agree that you mask out the values. However for the mean calculation the number of padded zeros will impact the mean value even though the sum does not get impacted. So if my sentence length changes due to different amount of padding based on the max sentence length in a batch, the mean value would change even though the padded values are masked out to zero.

Similarly for the max calculation since the padded values are masked out to zero, the max will be zero if the actual values in the matrix are all negative. We need to remove the padded values for max calculation since a value of zero impacts the max calculation (the padded values need to be set to negative infinity so that it does not impact the ma calculation)

I can see that in the the latest codebase you have masked out the question/passgae representation for the padded word tokens. I guess in the older codebase that did not happen. Thanks for fixing this issue !

zhiguowang commented 6 years ago

Oh, I see what you mean. You are right, the padding zeros will affect the mean calculation. I will think about it. Thanks!

agupta74 commented 6 years ago

Thanks ! Also please look into the max calculation issue as mentioned in my comments above

agupta74 commented 6 years ago

@zhiguowang Any update on this ? I can submit a PR for the fix if that helps so that you can verify the fix and perform any other testing that is required and then merge. Thanks !