tensorflow / community

Stores documents used by the TensorFlow developer community
Apache License 2.0
1.26k stars 576 forks source link

Stochastic rounding #436

Closed elfieguo closed 1 year ago

elfieguo commented 1 year ago

[RFC] Enable Stochastic Rounding in Tensorflow

poulsbo commented 1 year ago

Does the proposed API allow the caller to pass in a single seed that governs the RNG for multiple operations?

For example, does this work as the user probably expects?

cast_a = tf.stochastic_cast(input_a, dtype=tf.bf8, random_seed=my_seed)
cast_b = tf.stochastic_cast(input_b, dtype=tf.bf8)

Will my_seed be used for all random number generation across both calls to tf.stochastic_cast?

wangpengmit commented 1 year ago

Does the proposed API allow the caller to pass in a single seed that governs the RNG for multiple operations?

I recommend making random_seed (I recommend renaming it to seed) not optional, similar to e.g. tf.random.stateless_uniform. Then we won't have this ambiguity.

elfieguo commented 1 year ago

Does the proposed API allow the caller to pass in a single seed that governs the RNG for multiple operations?

For example, does this work as the user probably expects?

cast_a = tf.stochastic_cast(input_a, dtype=tf.bf8, random_seed=my_seed)
cast_b = tf.stochastic_cast(input_b, dtype=tf.bf8)

Will my_seed be used for all random number generation across both calls to tf.stochastic_cast?

As Peng suggested, flagging random seed as required to 1) disambiguates the usage; 2) keep usage consistent with tf.random.stateless_uniform

elfieguo commented 1 year ago

Approved by the leads. Below are the notes from the design review:

  1. From Reed: We can consider adding a stochastic_cast method to tf.random.Generator, in addition to the top-level tf.stochastic_cast function. For example, in the longer term, we can add a Generator.stochastic_cast method. That way the user can call generator.stochastic_cast(x, tf.float8_e5m2) instead of tf.stochastic_cast(x, tf.float8_e5m2, seed=generator.uniform_full_int([2])). For now we can just stick with tf.stochastic_cast.

  2. From Yu: Fixed seed might still introduce rounding bias even though the whole point of stochastic rounding is to reduce bias. This can be eliminated by varying seeds every time calling the API.

  3. From Antonio: Given a fixed seed and fixed location within the tensor, there will be a bias at that location dependent on the value of pRNG(seed, location). This can be easily seen if pRNG(s1, i1) = 1, for example, which will always cause the value at tensor location l1 to be rounded up. Without prior knowledge of the generated values, the expected bias is still zero. And in operations that mix the values (reductions, multiplications), as long as the tensor values themselves are somewhat independent and identically distributed (or at least have similar means), the mixing of terms will again cause an expected bias of zero.

  4. From Chiachen: A fixed seed might not be of a big problem as in sequential matmuls, rounding errors across operands may cancel off bias, accumulated error might be insignificant.

  5. From Peng: Should default random number algorithm to auto_select because Philox is slow on TPU and Threefry is slow on CPU and GPU. Also, Peng has is working in progress to add uint16 support to random number generator API.

  6. From Antonio: In the future, if users request a custom random number generator for stochastic rounding, they might experiment their algorithms with a simplified implementation in primitive ops. If they are proven to work better than current algorithms, it's not hard to support additional algorithms on the current infra.

  7. From Reed: We should emphasize in the doc string that it's encouraged to use different seeds (potentially generated from random generators) instead of fixed ones.