[CLS] similar context vector on Evaluation

While training my model on the Arabic language, I was logging some of the values that were getting processed and generated by the model. the following is a sample log that I was getting while training the model:

top_vec:  tensor([[[-0.2439,  0.2242,  1.3744,  ...,  1.2180, -1.4410, -1.3635],
         [-0.2523,  0.1137,  1.3378,  ...,  1.2184, -0.1754, -1.2815],
         [-0.4105,  0.0702,  1.4091,  ...,  1.2221, -1.5671, -1.3778],
         ...,
         [ 0.0288, -0.6760,  1.5258,  ...,  1.3763, -1.4011, -1.3328],
         [-0.0218, -0.3249,  1.1765,  ...,  1.4232, -1.2773, -1.1683],
         [ 0.0678,  0.2823,  1.2759,  ...,  1.2741,  0.0080, -1.0290]]],
       device='cuda:0', grad_fn=<NativeLayerNormBackward>) torch.Size([1, 432, 768])

clss:  tensor([[  0,  31,  73,  90, 104, 142, 169, 187, 199, 213, 236, 273, 297, 315,
         337, 351, 364, 382, 415]], device='cuda:0') torch.Size([1, 19])

sents_vec:  tensor([[[-0.2439,  0.2242,  1.3744,  ...,  1.2180, -1.4410, -1.3635],
         [-0.2009, -0.0098,  0.3056,  ...,  1.2681, -1.3180, -1.2614],
         [-0.2254, -0.0302,  0.2825,  ...,  1.3459, -0.9250, -1.1691],
         ...,
         [-0.2042, -0.1110,  1.3395,  ...,  1.2766, -1.2633, -1.1890],
         [-0.1571, -0.6477,  1.2429,  ...,  0.6955, -0.8612, -1.1577],
         [-0.2982, -0.9736,  1.2249,  ...,  1.3346, -1.3179, -1.0534]]],
       device='cuda:0', grad_fn=<MulBackward0>) torch.Size([1, 19, 768])

sent_scores:  tensor([[0.2587, 0.1031, 0.2036, 0.0026, 0.2685, 0.0003, 0.0006, 0.0015, 0.0039,
         0.0027, 0.0164, 0.0015, 0.0077, 0.0006, 0.0005, 0.0009, 0.0770, 0.0069,
         0.0009]], device='cuda:0', grad_fn=<SqueezeBackward1>) torch.Size([1, 19])

[2022-02-25 00:37:03,025 INFO] Step 2155/50000; xent: 0.39; lr: 0.0000500;  12 docs/s;    280 sec

Everything seemed to be going well until I executed the train.py with testing mode, all the [CLS] tokens were generating the exact same value:

top_vec:  tensor([[[ 0.0841,  0.3211, -0.1155,  ...,  0.5341, -0.0099, -0.0151],
         [ 0.0841,  0.3211, -0.1155,  ...,  0.5341, -0.0099, -0.0151],
         [ 0.0841,  0.3211, -0.1155,  ...,  0.5341, -0.0099, -0.0151],
         ...,
         [ 0.0841,  0.3211, -0.1155,  ...,  0.5341, -0.0099, -0.0151],
         [ 0.0841,  0.3211, -0.1155,  ...,  0.5341, -0.0099, -0.0151],
         [ 0.0841,  0.3211, -0.1155,  ...,  0.5341, -0.0099, -0.0151]]],
       device='cuda:0') torch.Size([1, 512, 768])
clss:  tensor([[  0,  38,  51,  79, 130, 150, 171, 213, 258, 271, 304, 326, 345, 362,
         378, 395, 413, 449, 471, 492]], device='cuda:0') torch.Size([1, 20])
sents_vec:  tensor([[[ 0.0841,  0.3211, -0.1155,  ...,  0.5341, -0.0099, -0.0151],
         [ 0.0841,  0.3211, -0.1155,  ...,  0.5341, -0.0099, -0.0151],
         [ 0.0841,  0.3211, -0.1155,  ...,  0.5341, -0.0099, -0.0151],
         ...,
         [ 0.0841,  0.3211, -0.1155,  ...,  0.5341, -0.0099, -0.0151],
         [ 0.0841,  0.3211, -0.1155,  ...,  0.5341, -0.0099, -0.0151],
         [ 0.0841,  0.3211, -0.1155,  ...,  0.5341, -0.0099, -0.0151]]],
       device='cuda:0') torch.Size([1, 20, 768])
sent_scores:  tensor([[0.0567, 0.0567, 0.0567, 0.0567, 0.0567, 0.0567, 0.0567, 0.0567, 0.0567,
         0.0567, 0.0567, 0.0567, 0.0567, 0.0567, 0.0567, 0.0567, 0.0567, 0.0567,
         0.0567, 0.0567]], device='cuda:0') torch.Size([1, 20])

top_vec:  tensor([[[ 0.0841,  0.3211, -0.1155,  ...,  0.5341, -0.0099, -0.0151],
         [ 0.0841,  0.3211, -0.1155,  ...,  0.5341, -0.0099, -0.0151],
         [ 0.0841,  0.3211, -0.1155,  ...,  0.5341, -0.0099, -0.0151],
         ...,
         [ 0.0841,  0.3211, -0.1155,  ...,  0.5341, -0.0099, -0.0151],
         [ 0.0841,  0.3211, -0.1155,  ...,  0.5341, -0.0099, -0.0151],
         [ 0.0841,  0.3211, -0.1155,  ...,  0.5341, -0.0099, -0.0151]]],
       device='cuda:0') torch.Size([1, 512, 768])
clss:  tensor([[  0,  43,  92, 127, 151, 172, 191, 226, 242, 256, 269, 290, 312, 330,
         365, 410, 433, 461, 482, 508]], device='cuda:0') torch.Size([1, 20])
sents_vec:  tensor([[[ 0.0841,  0.3211, -0.1155,  ...,  0.5341, -0.0099, -0.0151],
         [ 0.0841,  0.3211, -0.1155,  ...,  0.5341, -0.0099, -0.0151],
         [ 0.0841,  0.3211, -0.1155,  ...,  0.5341, -0.0099, -0.0151],
         ...,
         [ 0.0841,  0.3211, -0.1155,  ...,  0.5341, -0.0099, -0.0151],
         [ 0.0841,  0.3211, -0.1155,  ...,  0.5341, -0.0099, -0.0151],
         [ 0.0841,  0.3211, -0.1155,  ...,  0.5341, -0.0099, -0.0151]]],
       device='cuda:0') torch.Size([1, 20, 768])
sent_scores:  tensor([[0.0567, 0.0567, 0.0567, 0.0567, 0.0567, 0.0567, 0.0567, 0.0567, 0.0567,
         0.0567, 0.0567, 0.0567, 0.0567, 0.0567, 0.0567, 0.0567, 0.0567, 0.0567,
         0.0567, 0.0567]], device='cuda:0') torch.Size([1, 20])

can anyone please help and indicate why such problem is occurring with me.

nlpyang / BertSum

[CLS] similar context vector on Evaluation #125