Closed elfieguo closed 1 year ago
Does the proposed API allow the caller to pass in a single seed that governs the RNG for multiple operations?
For example, does this work as the user probably expects?
cast_a = tf.stochastic_cast(input_a, dtype=tf.bf8, random_seed=my_seed)
cast_b = tf.stochastic_cast(input_b, dtype=tf.bf8)
Will my_seed
be used for all random number generation across both calls to tf.stochastic_cast
?
Does the proposed API allow the caller to pass in a single seed that governs the RNG for multiple operations?
I recommend making random_seed
(I recommend renaming it to seed
) not optional, similar to e.g. tf.random.stateless_uniform
. Then we won't have this ambiguity.
Does the proposed API allow the caller to pass in a single seed that governs the RNG for multiple operations?
For example, does this work as the user probably expects?
cast_a = tf.stochastic_cast(input_a, dtype=tf.bf8, random_seed=my_seed) cast_b = tf.stochastic_cast(input_b, dtype=tf.bf8)
Will
my_seed
be used for all random number generation across both calls totf.stochastic_cast
?
As Peng suggested, flagging random seed as required to 1) disambiguates the usage; 2) keep usage consistent with tf.random.stateless_uniform
Approved by the leads. Below are the notes from the design review:
From Reed: We can consider adding a stochastic_cast
method to tf.random.Generator
, in addition to the top-level tf.stochastic_cast
function. For example, in the longer term, we can add a Generator.stochastic_cast
method. That way the user can call generator.stochastic_cast(x, tf.float8_e5m2)
instead of tf.stochastic_cast(x, tf.float8_e5m2, seed=generator.uniform_full_int([2]))
. For now we can just stick with tf.stochastic_cast
.
From Yu: Fixed seed might still introduce rounding bias even though the whole point of stochastic rounding is to reduce bias. This can be eliminated by varying seeds every time calling the API.
From Antonio: Given a fixed seed and fixed location within the tensor, there will be a bias at that location dependent on the value of pRNG(seed, location)
. This can be easily seen if pRNG(s1, i1) = 1
, for example, which will always cause the value at tensor location l1
to be rounded up.
Without prior knowledge of the generated values, the expected bias is still zero. And in operations that mix the values (reductions, multiplications), as long as the tensor values themselves are somewhat independent and identically distributed (or at least have similar means), the mixing of terms will again cause an expected bias of zero.
From Chiachen: A fixed seed might not be of a big problem as in sequential matmuls, rounding errors across operands may cancel off bias, accumulated error might be insignificant.
From Peng: Should default random number algorithm to auto_select
because Philox is slow on TPU and Threefry is slow on CPU and GPU. Also, Peng has is working in progress to add uint16 support to random number generator API.
From Antonio: In the future, if users request a custom random number generator for stochastic rounding, they might experiment their algorithms with a simplified implementation in primitive ops. If they are proven to work better than current algorithms, it's not hard to support additional algorithms on the current infra.
From Reed: We should emphasize in the doc string that it's encouraged to use different seeds (potentially generated from random generators) instead of fixed ones.
[RFC] Enable Stochastic Rounding in Tensorflow