Why does the sequence length of vectors affect the calculation results of dense under bf16?

pass-lin commented 3 months ago

import os os.environ['KERAS_BACKEND'] = 'torch' os.environ['OPS_KERNAL'] = '1' import keras keras.config.set_floatx('bfloat16') from keras import ops import numpy as np initial_dim = 2048 finally_dim = 64 z = ops.convert_to_tensor(np.random.random([1,36,initial_dim])) dense = keras.layers.Dense(finally_dim) z1 = dense(z) z2 = dense(z[:,:8]) print(ops.isclose(z1[:,:8],z2).all())

Example code is as above. In some cases, when the above z1 and z2 are found to not pass isclose, theoretically, and under fp32, they should be able to pass isclose in any situation. What is the problem, and how can it be solved? This bug also be found at tf and jax backend，but not found at numpy backend pass case:initial_dim = 2048 finally_dim =2048 ;initial_dim = 2048 finally_dim =4096 ;initial_dim = 1024 finally_dim =2048 ; fail case: initial_dim = 2048 finally_dim =64;initial_dim = 2048 finally_dim =1024 ;initial_dim = 1024 finally_dim =2047 ;

However, similarly, we did not find a similar issue in pure torch. import torch import numpy as np initial_dim = 4096 finally_dim = 32 z = torch.tensor(np.random.random([1,36,initial_dim]),dtype=torch.bfloat16) linear = torch.nn.Linear(initial_dim,finally_dim).bfloat16() z1 = linear(z) z2 = linear(z[:,:8]) print(torch.isclose(z1[:,:8],z2).all())

mehtamansi29 commented 3 months ago

Hi @pass-lin -

Thanks for reporting the issue. I have tested the code snippet and doesn't reproduces the reported behaviour in keras 3.3.3 version with torch backend. Attached gist file for reference. Could you let us know which keras version using here?

pass-lin commented 3 months ago

Hi @pass-lin -

Thanks for reporting the issue. I have tested the code snippet and doesn't reproduces the reported behaviour in keras 3.3.3 version with torch backend. Attached gist file for reference. Could you let us know which keras version using here?

my test enviroment as follow devices:4060ti
OS:WSL ubuntu22.04 keras:3.3.3 torch: 2.2.2+cu121 jax:0.4.23+cuda12 tensorflow-cpu 1.15

this bug also be found at windows 10+torch2.2.1+keras3.3.3 but not found at other devices(pure linux+A800+torch 2.2.0(cu118)+keras 3.3.3,when backend is torch when env is (pure linux+A800+jax0.4.28+cuda12.cudnn89 or 0.4.23+cuda11.cudnn86+keras 3.3.3,this bug also exist And this bug can not found at CPU and V100

szxysdt commented 3 months ago

I reproduced this bug in this environment:

& pip list
Package           Version
----------------- ------------
absl-py           2.1.0       
filelock          3.13.1      
fsspec            2024.2.0    
h5py              3.11.0      
intel-openmp      2021.4.0    
Jinja2            3.1.3       
keras             3.4.1       
markdown-it-py    3.0.0       
MarkupSafe        2.1.5       
mdurl             0.1.2       
mkl               2021.4.0    
ml-dtypes         0.4.0       
mpmath            1.3.0       
namex             0.0.8       
networkx          3.3
numpy             1.26.4      
optree            0.11.0      
packaging         24.1        
pillow            10.3.0      
pip               24.1.1      
Pygments          2.18.0      
rich              13.7.1      
setuptools        58.1.0      
sympy             1.12.1      
tbb               2021.13.0   
torch             2.3.1+cu121 
torchaudio        2.3.1+cu121 
torchvision       0.18.1+cu121
typing_extensions 4.12.2

cuda device: 3060-12G platform: Windows10

code:

import os
os.environ['KERAS_BACKEND'] = 'torch'
os.environ['OPS_KERNAL'] = '1'
import keras
keras.config.set_floatx('bfloat16')
from keras import ops
import numpy as np
initial_dim = 2048
finally_dim = 64
z = ops.convert_to_tensor(np.random.random([1,36,initial_dim]))
dense = keras.layers.Dense(finally_dim)
z1 = dense(z)
z2 = dense(z[:,:8])
print(ops.isclose(z1[:,:8],z2).all())

output:

tensor(False, device='cuda:0')

123mbcz123 commented 3 months ago

I cannot reproduce this error. One possible reason is that the graphics card is a 2080Ti, and the tensorcores of the 2080Ti do not support bfloat16 calculations, so bfloat16 is handled by CUDA.

platform: windows 10 graphic card: 2080Ti 11G pytorch version: 2.3.1+cu121 keras version: 3.4.1

output: tensor(True, device='cuda:0')

SamanehSaadat commented 3 months ago

@pass-lin Could you provide a colab that reproduces the issue?

pass-lin commented 3 months ago

@pass-lin Could you provide a colab that reproduces the issue?

I don't think I can provide you with a Windows environment or one with an RTX 30 or 40 series on Colab.

SamanehSaadat commented 3 months ago

@pass-lin Unfortunately, I don't think we would be able to help with debugging if we can't reproduce the issue on our side.

pass-lin commented 3 months ago

@pass-lin Unfortunately, I don't think we would be able to help with debugging if we can't reproduce the issue on our side.

Can't you reproduce this bug in A100 or Windows?

SamanehSaadat commented 2 months ago

Hi @pass-lin!

I was able to reproduce the bug in A100 with the Torch backend! However, things worked as expected with JAX backend! Anyway, this is a bug and we'll look into it! Thanks for reporting the issue.

keras-team / keras

Why does the sequence length of vectors affect the calculation results of dense under bf16? #19878