Open justheuristic opened 2 years ago
Curiously, the first two iterations of LeanTransformer on CPU may differ by a small amount (~1e-5) even with use_deterministic_algorithms(True)
To reproduce, go to this test and remove "for i in range 2" https://github.com/learning-at-home/lean_transformer/blob/e737a8ff9274e0ff1492dc76a62ecf36506f3e67/tests/test_modifications.py#L63-L68
Known facts:
Hypothesis: the issue may be due to jit running non-optimized code on the first pass. This may have a different RNG behavior and/or different dtypes.
Curiously, the first two iterations of LeanTransformer on CPU may differ by a small amount (~1e-5) even with use_deterministic_algorithms(True)
To reproduce, go to this test and remove "for i in range 2" https://github.com/learning-at-home/lean_transformer/blob/e737a8ff9274e0ff1492dc76a62ecf36506f3e67/tests/test_modifications.py#L63-L68
Known facts:
Hypothesis: the issue may be due to jit running non-optimized code on the first pass. This may have a different RNG behavior and/or different dtypes.