the same input with different batchsize got different precision output

wangyunxiaa commented 3 years ago

Issue description

Provide a short description.

Code example

I have an Transform model when i do inference with an input: att_out, ctc_out, state1 = encode_model(inputs, length, outlength, num_idb, state)

and then, I got a batch input by followings:

` batch = int(sys.argv[1]) inputs = torch.cat([inputs for i in range(batch)], dim=0) length = torch.cat([length for i in range(batch)], dim=0) outlength = torch.cat([outlength for i in range(batch)], dim=0) num_idb = torch.cat([num_id for i in range(batch)], dim=0) state = torch.cat([state for i in range(batch)], dim=0) att_out, ctc_out, state1 = encode_model(inputs, length, outlength, num_idb, state)

` However,I get different precision output with diffreent batch:

`batch=== 1 inp torch.Size([10, 1, 512]) tensor([ 5.601439476013, 0.674656093121, -0.775768756866, 3.230525732040, 2.853605270386], device='cuda:0') query-in= torch.Size([1, 10, 512]) tensor([ 0.143003374338, 0.020905975252, -0.019955551252, 0.063370727003, 0.087554275990], device='cuda:0') query-out= torch.Size([1, 10, 512]) tensor([ 0.287363290787, -0.723381698132, -0.966923415661, 0.632085084915, 0.111495062709], device='cuda:0')

att-o tensor([-0.028479695320, -0.300614058971, -1.779013872147, -0.866003751755, -0.241535440087], device='cuda:0')

batch=== 2 inp torch.Size([10, 2, 512]) tensor([ 5.601439476013, 0.674656093121, -0.775768756866, 3.230525732040, 2.853605270386], device='cuda:0') query-in= torch.Size([2, 10, 512]) tensor([ 0.143003374338, 0.020905975252, -0.019955551252, 0.063370727003, 0.087554275990], device='cuda:0') query-out= torch.Size([2, 10, 512]) tensor([ 0.287363290787, -0.723381698132, -0.966923415661, 0.632085084915, 0.111495062709], device='cuda:0')

att-o tensor([-0.028479695320, -0.300614058971, -1.779013872147, -0.866003751755, -0.241535440087], device='cuda:0')

batch=== 20 inp torch.Size([10, 20, 512]) tensor([ 5.601439476013, 0.674656093121, -0.775768756866, 3.230525732040, 2.853605270386], device='cuda:0') query-in= torch.Size([20, 10, 512]) tensor([ 0.143003374338, 0.020905975252, -0.019955551252, 0.063370727003, 0.087554275990], device='cuda:0') query-out= torch.Size([20, 10, 512]) tensor([ 0.287363111973, -0.723381698132, -0.966923296452, 0.632085025311, 0.111494973302], device='cuda:0')

att-o tensor([-0.028478384018, -0.300614416599, -1.779013276100, -0.866004824638, -0.241535827518], device='cuda:0') `

I have set these env:

`if torch.version >= '1.8': torch.use_deterministic_algorithms(True) else: torch.set_deterministic(True)

torch.backends.cudnn.deterministic = True torch.backends.cudnn.benchmark = False `

and I have check the inside operations,
print("query-in=", input.shape, input[0][0][0:5]) linear(query, in_proj_weight, in_proj_bias).chunk(3, dim=-1) print("query-out=", q.shape, q[0][0][0:5])

It is caused by linear ( return torch._C._nn.linear(input, weight, bias) ) changed it to query.matmul(in_proj_weight.t())+in_proj_bias has no use.

How to get deterministic result with different batch ?

Please try to provide a minimal example to repro the bug. Error messages and stack traces are also helpful.

System Info

Please copy and paste the output from our environment collection script (or fill out the checklist below manually).

You can get the script and run it with:

wget https://raw.githubusercontent.com/pytorch/pytorch/master/torch/utils/collect_env.py
# For security purposes, please check the contents of collect_env.py before running it.
python collect_env.py

PyTorch version: 1.8.1 Is debug build: False CUDA used to build PyTorch: 10.1 ROCM used to build PyTorch: N/A

OS: CentOS Linux 7 (Core) (x86_64) GCC version: (GCC) 5.4.0 Clang version: Could not collect CMake version: Could not collect

Python version: 3.7 (64-bit runtime) Is CUDA available: True CUDA runtime version: 10.0.130 GPU models and configuration: GPU 0: GeForce GTX 1080 Ti Nvidia driver version: 418.56 cuDNN version: Could not collect HIP runtime version: N/A MIOpen runtime version: N/A

Versions of relevant libraries: [pip3] numpy==1.18.5 [pip3] pytorch-asr==0.3.1.dev48+g166d73d.d20201130 [pip3] pytorch-memlab==0.2.2 [pip3] pytorch-ranger==0.1.1 [pip3] pytorch-wpe==0.0.0 [pip3] torch==1.8.1 [pip3] torch-complex==0.2.0 [pip3] torch-optimizer==0.0.1a17 [pip3] torchaudio==0.8.0a0+e4e171a [pip3] torchvision==0.9.1 [pip3] warpctc-pytorch==0.1 [conda] blas 1.0 mkl https://mirrors.bfsu.edu.cn/anaconda/pkgs/main [conda] cudatoolkit 10.1.243 h6bb024c_0 https://mirrors.bfsu.edu.cn/anaconda/pkgs/main [conda] ffmpeg 4.3 hf484d3e_0 https://mirrors.bfsu.edu.cn/anaconda/cloud/pytorch [conda] mkl 2020.2 256 https://mirrors.bfsu.edu.cn/anaconda/pkgs/main [conda] mkl-service 2.3.0 py37he8ac12f_0 https://mirrors.bfsu.edu.cn/anaconda/pkgs/main [conda] mkl_fft 1.3.0 py37h54f3939_0 https://mirrors.bfsu.edu.cn/anaconda/pkgs/main [conda] mkl_random 1.1.1 py37h0573a6f_0 https://mirrors.bfsu.edu.cn/anaconda/pkgs/main [conda] numpy 1.19.2 py37h54aff64_0 https://mirrors.bfsu.edu.cn/anaconda/pkgs/main [conda] numpy-base 1.19.2 py37hfa32c7d_0 https://mirrors.bfsu.edu.cn/anaconda/pkgs/main [conda] pytorch 1.8.1 py3.7_cuda10.1_cudnn7.6.3_0 https://mirrors.bfsu.edu.cn/anaconda/cloud/pytorch [conda] pytorch-asr 0.3.1.dev48+g166d73d.d20201130 dev_0 [conda] pytorch-memlab 0.2.2 dev_0 [conda] pytorch-ranger 0.1.1 pypi_0 pypi [conda] pytorch-wpe 0.0.0 pypi_0 pypi [conda] torch-complex 0.2.0 pypi_0 pypi [conda] torch-optimizer 0.0.1a17 pypi_0 pypi [conda] torchaudio 0.8.1 py37 https://mirrors.bfsu.edu.cn/anaconda/cloud/pytorch [conda] torchvision 0.9.1 py37_cu101 https://mirrors.bfsu.edu.cn/anaconda/cloud/pytorch [conda] warpctc-pytorch 0.1 dev_0

cc @mruberry @kurtamohler @albanD @jbschlosser

mruberry commented 3 years ago

This question is probably more appropriate for the PyTorch forums: https://discuss.pytorch.org/

albanD commented 3 years ago

How to get deterministic result with different batch ?

If you use different batch size, then you are doing different operations. So it is not a problem of determinism. Unfortunately, when you change the input size, the low level compute libraries that we use might choose a different algorithm which can lead to small difference of the order of 1e-6 for each single op (as per the floating point standard). It's not something we can change.

You can use double precision numbers if you need more precision. But that will never be bit-wise perfect. And so, as you model grows and training goes, they will inevitably diverge.

pytorch / pytorch