Fix test_clip_grads_with_tp

Fixes #92

After finding that the tests in test_clip_grads.py were failing, I investigated the issue and fixed the test_clip_grads_with_tp.

Initially, it seemed quite strange what was happening, but after checking BIT by BIT that the inputs, weights, and biases were the same, the issue could only be due to the matrix multiplication operation.

When performing MM, we perform the operation with one algorithm or another depending on the shape of the matrices. Additionally, the tests had quite "exotic" values, with in_features = 2 and out_features_per_tp_rank = 3. Simply changing these values to 4 and 8 respectively fixes the problem, as now we will be performing the operations with the same matmul algorithm.

I have run multiple tests and this test no longer fails, but both test_clip_grads_with_pp and test_clip_grads_tied_weights still fail very occasionally. I have tried applying the same reasoning to these tests, but I have not been able to fix them. They are most likely related to what I have mentioned.

It is worth mentioning that I have run the tests on 2 A100 80GB GPUs, and the choice of algorithm for MM depends on the hardware.

[1], [2], [3], [4], [5]

huggingface / nanotron

Fix test_clip_grads_with_tp #122