Why do I get almost the same codes after the 1st batch?

Hi there, I am trying to quantize my input feature sparse_feat with the following codes in my network.

class MyModel(nn.Module):
    def __init__ (self):
        super(MyModel, self).__init__()
        self.residual_vq = ResidualVQ(
                    dim = 256,
                    codebook_size = 128,
                    num_quantizers = 3,
                    threshold_ema_dead_code = 2
                )
    def forward(self, sparse_feat):
        quantized_sparse, codes_sparse, commit_loss_sparse = self.residual_vq(sparse_feat)

model = MyModel()
optimizer = torch.optim.Adam(model.parameters(), lr=args.lr)

loop_inner = tqdm(enumerate(dataloader, 0), total=len(dataloader), leave=True)
for idx, (x, y) in loop_inner:
    quantized_sparse, codes_sparse, commit_loss_sparse = model(x)
    optimizer.zero_grad()
    loss.backward()
    optimizer.step()

However, I've observed that, in the first batch, the codes I got were uniformly distributed. In the second and following batches, the codes and features within codes_sparse and quantized_sparse were almost the same. The following codes were I got for the second batch

tensor([[ 44,  13,  82],
        [ 44,  13,  82],
        [ 44,  13,  82],
        [ 44,  13,  82],
        [ 44,  13,  82],
        [ 44,  13,  82],
        [ 44, 111,  82],
        [ 44,  13,  82],
        X 16 times,
        [ 44, 111,  82],
        [ 44,  13,  82],
        X 11 times,
        [ 44, 111,  82],
        [ 44,  13,  82],
        X 5 times,
        [ 44, 111,  82],
        [ 44,  13,  82],
         X 11 times
        [ 44, 111,  82],
        [ 44, 111,  82],
        [ 44,  13,  82],
        X 8 times])

Note that the input features within sparse_feat are not alike, so I am wondering what could go wrong here? Am I not configuring the quantization procedure right? Looking forward to your helpful suggestions. Appreciated in advance.

lucidrains / vector-quantize-pytorch

Why do I get almost the same codes after the 1st batch? #131