keras-team / keras

Deep Learning for humans
http://keras.io/
Apache License 2.0
62.16k stars 19.49k forks source link

When using exponential as the activation function, the outputs of the CPU and GPU have large differences #20310

Open PhyllisJi opened 2 months ago

PhyllisJi commented 2 months ago

Issue type

Bug

Have you reproduced the bug with TensorFlow Nightly?

No

Source

binary

TensorFlow version

tf 2.12.0

Custom code

Yes

OS platform and distribution

Ubuntu 20.04

Mobile device

No response

Python version

3.10

Bazel version

No response

GCC/compiler version

No response

CUDA/cuDNN version

No response

GPU model and memory

No response

Current behavior?

Input: [[25.05214, 6.4932823, 5.5203633, 12.618748, 27.186777, 3.7995481]] CPU output: [[7.5858780e+10 6.6068842e+02 2.4972575e+02 3.0217081e+05 6.4130895e+11 4.4680992e+01]] GPU output: [[7.5858780e+10 6.6068842e+02 2.4972574e+02 3.0217081e+05 6.4130888e+11 4.4680988e+01]] Max distance: 65536.0

Standalone code to reproduce the issue

import tensorflow as tf
import numpy as np

tf.random.set_seed(42)
tf.config.experimental.enable_op_determinism()

def chebyshev_distance(A: np.ndarray, B: np.ndarray):
    if A is None or B is None:
        return 0.0
    if A.shape != B.shape:
        return 9999999
    else:
        return float(np.max(np.abs(A - B)))

act_layer = tf.keras.layers.Activation(activation='exponential')
inp = np.array([[25.05214, 6.4932823, 5.5203633, 12.618748, 27.186777, 3.7995481]])
input_shape = inp.shape[1:]
act_layer.build(input_shape)

with tf.device('/CPU:0'):
    x_cpu = tf.constant(inp, dtype=tf.float32)
    output_cpu = act_layer(x_cpu)
    print("CPU output:", output_cpu.numpy())

if tf.config.list_physical_devices('GPU'):
    with tf.device('/GPU:0'):
        x_gpu = tf.constant(inp, dtype=tf.float32)
        output_gpu = act_layer(x_gpu)
        print("GPU output:", output_gpu.numpy())
else:
    print("GPU not available.")

output_diff = chebyshev_distance(output_cpu.numpy(), output_gpu.numpy())
print(output_diff)

Relevant log output

No response

mehtamansi29 commented 2 months ago

Hi @PhyllisJi -

Thanks for reporting the issue. Here as per the code seems like you are using tensorflow older version(tf2.12.0). It would be nice if you can use keras3.5.0 and tensorflow2.17.0 latest version with GPU using tf.distribute.MirroredStrategy()scope and CPU using keras.device('/device:CPU:0'):. These output from CPU and GPU will give minimum difference.

import tensorflow as tf
import keras
print(tf.__version__)         #2.17.0
print(keras.__version__)      #3.5.0
import numpy as np

def chebyshev_distance(A: np.ndarray, B: np.ndarray):
    if A is None or B is None:
        return 0.0
    if A.shape != B.shape:
        return 9999999
    else:
        return float(np.max(np.abs(A - B)))

strategy = tf.distribute.MirroredStrategy()
print('Number of devices: {}'.format(strategy.num_replicas_in_sync))

with strategy.scope():
    act_layer = tf.keras.layers.Activation(activation='exponential')
    inp = np.array([[25.05214, 6.4932823, 5.5203633, 12.618748, 27.186777, 3.7995481]])
    input_shape = inp.shape[1:]
    act_layer.build(input_shape)

    x_gpu = tf.constant(inp, dtype=tf.float32)
    output_gpu = act_layer(x_gpu)
    print("GPU output:", output_gpu.numpy())

with keras.device('/device:CPU:0'):
    act_layer = tf.keras.layers.Activation(activation='exponential')
    inp = np.array([[25.05214, 6.4932823, 5.5203633, 12.618748, 27.186777, 3.7995481]])
    input_shape = inp.shape[1:]
    act_layer.build(input_shape)
    x_cpu = tf.constant(inp, dtype=tf.float32)
    output_cpu = act_layer(x_cpu)
    print("CPU output:", output_gpu.numpy())
output_diff = chebyshev_distance(output_cpu.numpy(), output_gpu.numpy())
print(output_diff)
github-actions[bot] commented 1 month ago

This issue is stale because it has been open for 14 days with no activity. It will be closed if no further activity occurs. Thank you.

PhyllisJi commented 1 month ago

Hi @PhyllisJi -

Thanks for reporting the issue. Here as per the code seems like you are using tensorflow older version(tf2.12.0). It would be nice if you can use keras3.5.0 and tensorflow2.17.0 latest version with GPU using tf.distribute.MirroredStrategy()scope and CPU using keras.device('/device:CPU:0'):. These output from CPU and GPU will give minimum difference.

import tensorflow as tf
import keras
print(tf.__version__)         #2.17.0
print(keras.__version__)      #3.5.0
import numpy as np

def chebyshev_distance(A: np.ndarray, B: np.ndarray):
    if A is None or B is None:
        return 0.0
    if A.shape != B.shape:
        return 9999999
    else:
        return float(np.max(np.abs(A - B)))

strategy = tf.distribute.MirroredStrategy()
print('Number of devices: {}'.format(strategy.num_replicas_in_sync))

with strategy.scope():
    act_layer = tf.keras.layers.Activation(activation='exponential')
    inp = np.array([[25.05214, 6.4932823, 5.5203633, 12.618748, 27.186777, 3.7995481]])
    input_shape = inp.shape[1:]
    act_layer.build(input_shape)

    x_gpu = tf.constant(inp, dtype=tf.float32)
    output_gpu = act_layer(x_gpu)
    print("GPU output:", output_gpu.numpy())

with keras.device('/device:CPU:0'):
    act_layer = tf.keras.layers.Activation(activation='exponential')
    inp = np.array([[25.05214, 6.4932823, 5.5203633, 12.618748, 27.186777, 3.7995481]])
    input_shape = inp.shape[1:]
    act_layer.build(input_shape)
    x_cpu = tf.constant(inp, dtype=tf.float32)
    output_cpu = act_layer(x_cpu)
    print("CPU output:", output_gpu.numpy())
output_diff = chebyshev_distance(output_cpu.numpy(), output_gpu.numpy())
print(output_diff)

Can you please tell me why tf.2.12.0 doesn't behave quite the same as the new version? Is there some kind of flaw?

mehtamansi29 commented 1 month ago

Hi @PhyllisJi -

Here tf2.12.0 works with keras2 which has not multi-GPU training functionality with Data parallelism and model parallelism, which is using tf.distribute.mirroredstrategy API. So it is better to use GPU with tf.distribute.mirroredstrategy API on keras3.

PhyllisJi commented 1 month ago

Hi @PhyllisJi -

Here tf2.12.0 works with keras2 which has not multi-GPU training functionality with Data parallelism and model parallelism, which is using tf.distribute.mirroredstrategy API. So it is better to use GPU with tf.distribute.mirroredstrategy API on keras3.

But I'm just using one GPU for training, will the results be affected as well?