Open kedlaya opened 8 years ago
The relevant code fragment from sage.matrix.matrix_mod2_dense.Matrix_mod2_dense.randomize
appears to be:
assert(sizeof(m4ri_word) == 8)
mask = __M4RI_LEFT_BITMASK(self._entries.ncols % m4ri_radix)
for i from 0 <= i < self._nrows:
for j from 0 <= j < self._entries.width:
# for portability we get 32-bit twice rather than 64-bit once
low = gmp_urandomb_ui(rstate.gmp_state, 32)
high = gmp_urandomb_ui(rstate.gmp_state, 32)
self._entries.rows[i][j] = m4ri_swap_bits( ((<unsigned long long>high)<<32) | (<unsigned long long>low) )
self._entries.rows[i][self._entries.width - 1] &= mask
sage: [random_matrix(GF(2), 19937, 19937).rank() for i in range(5)]
[19906, 19906, 19906, 19906, 19906]
sage: [random_matrix(GF(2), 19936, 19936).rank() for i in range(5)]
[19905, 19905, 19905, 19905, 19905]
and only 19905 appears for all the tries I did for 19937>n>19905. So here the rank is off by 25-1. It seems to get sane for n<19906.
sage: [random_matrix(GF(2), 19905, 19905).rank() for i in range(5)]
[19904, 19903, 19904, 19904, 19902]
sage: [random_matrix(GF(2), 19904, 19904).rank() for i in range(5)]
[19903, 19904, 19903, 19903, 19902]
Definitely something funny with the Mersenne twister...
If I understand correctly how linear pseudorandom number generators work, the Mersenne twister is computing iterates of a linear operator on F_2^19937
which acts as a permutation of this space, and the values we see coming out of gmp are the result of applying some fixed linear projection from F_2^19937
to F_2^32
. Since we're simply filling those bits directly into the matrix, that limits the rank to 19937.
In order to break out of this, we need to do something nonlinear with the gmp-provided bits...
Just tacking on some nonlinear operation will just be a new prng; At that point its probably better to look into wholly different prngs. MT isn't really state of the art any more. E.g. see http://xorshift.di.unimi.it/
I don't know enough about modern PRNGs to express a preference among them, but anything with a nonlinear projection would suffice to address this issue.
This is reproducible:
A functionally equivalent computation:
The docstring of the
randomize
method provides a clue:So it seems that we simply kicked the can down the road, and now that we can compute ranks at this size easily using m4ri, we have found the can again. Volker Braun observed on sage-devel (see https://groups.google.com/forum/#!topic/sage-devel/gDGM6bRle0A) that the 19937 is probably somehow coming from the Mersenne Twister having a period of
2^19937 - 1
.Note that the linear dependence caused by insufficiently pseudorandom number generation is quite fragile:
Component: linear algebra
Keywords: random matrices, rank, pseudorandom numbers
Stopgaps: todo
Issue created by migration from https://trac.sagemath.org/ticket/20493