microsoft / onnxruntime

ONNX Runtime: cross-platform, high performance ML inferencing and training accelerator
https://onnxruntime.ai
MIT License
14.1k stars 2.84k forks source link

cpu and gpu results is not the same #11590

Open cqray1990 opened 2 years ago

cqray1990 commented 2 years ago

Describe the bug A clear and concise description of what the bug is. To avoid repetition please make sure this is not one of the known issues mentioned on the respective release page.

Urgency If there are particular important use cases blocked by this or strict project-related timelines, please share more information and dates. If there are no hard deadlines, please specify none.

System information

To Reproduce

Expected behavior A clear and concise description of what you expected to happen.

Screenshots If applicable, add screenshots to help explain your problem.

i test the image with cpu onnxruntime,the results is ok, the OCR results is (Bank Account Number: 769681044905 Cleveland, OH 44194-4338),and the results is right,but with onnxruntime-gpu it only can recognise as (Bank Account Number: 769681044905 Cleveland, OH), it cut off some results,that's amazing.

tianleiwu commented 2 years ago

Have you measured accuracy with an evaluation data set? If accuracy is on par for GPU result, I think it is fine. Sometime, GEMM could get slight different result in CPU and CUDA due to partition and aggregation.

To investigate the cause, you might follow #7668 to dump inputs and outputs of each node, then you can find which node causes the difference.

cqray1990 commented 2 years ago

@tianleiwu but when use onnxruntime-gpu 1.7 the results is fine,that's make no sense

cqray1990 commented 2 years ago

@mattetti

tianleiwu commented 2 years ago

@cqray1990, are you able to the find out the cause by dumping inputs and outputs of each node? If you do so in onnxruntime-gpu 1.7 vs latest version, you can compare the dump outputs to find out which node cause the difference. Sometime, it might be caused by dependent DLLs like cuDNN etc.

If you need help, please share test script and model to reproduce the issue.

cqray1990 commented 1 year ago

the model is more than 25M can't upload and how can i give u the model files by email? @tianleiwu @sverrejoh @radical @mtodd

cqray1990 commented 1 year ago

@cqray1990, are you able to the find out the cause by dumping inputs and outputs of each node? If you do so in onnxruntime-gpu 1.7 vs latest version, you can compare the dump outputs to find out which node cause the difference. Sometime, it might be caused by dependent DLLs like cuDNN etc.

If you need help, please share test script and model to reproduce the issue.

i use onnxtime-gpu 1.13.1 results is wrong, but cpu is right the network is LSTM for OCR

tianleiwu commented 1 year ago

@cqray1990, to share the model, you can upload model to github or other cloud storage, and share a link.

To trouble shooting by yourself, do it like the following: use a binary built from source that enables dumping node inputs and outputs (see https://github.com/microsoft/onnxruntime/blob/main/onnxruntime/core/framework/debug_node_inputs_outputs_utils.h for detail). Then run cuda ep and cpu ep, and compare their dumping results. The first node that yields significant difference output between CUDA and CPU is the one.

An example is like the following:

git clone http://github.com/microsoft/onnxruntime
cd onnxruntime
export CUDA_HOME=/usr/local/cuda
export CUDNN_HOME=/usr/local/cuda
export CUDACXX=/usr/local/cuda/bin/nvcc
sh build.sh  --config Release  --build_shared_lib --parallel  --use_cuda --cuda_version  11.7 --cuda_home /usr/local/cuda --cudnn_home /usr/local/cuda --build_wheel --skip_test --cmake_extra_defines onnxruntime_DEBUG_NODE_INPUTS_OUTPUTS=1
cd build/Linux/Release/dist
pip install onnxruntime_gpu-1.15.0-cp310-cp310-linux_x86_64.whl --force-reinstall
export ORT_DEBUG_NODE_IO_DUMP_SHAPE_DATA=0
export ORT_DEBUG_NODE_IO_DUMP_NODE_PLACEMENT=0
export ORT_DEBUG_NODE_IO_DUMP_INPUT_DATA=0
export ORT_DEBUG_NODE_IO_DUMP_OUTPUT_DATA=1
export ORT_DEBUG_NODE_IO_DUMP_DATA_DESTINATION=stdout
export ORT_DEBUG_NODE_IO_SNIPPET_THRESHOLD=0
python test_my_model.py > dump_cpu.txt
python test_my_model.py --use_gpu > dump_gpu.txt
cqray1990 commented 1 year ago

@cqray1990, to share the model, you can upload model to github or other cloud storage, and share a link.

To trouble shooting by yourself, do it like the following: use a binary built from source that enables dumping node inputs and outputs (see https://github.com/microsoft/onnxruntime/blob/main/onnxruntime/core/framework/debug_node_inputs_outputs_utils.h for detail). Then run cuda ep and cpu ep, and compare their dumping results. The first node that yields significant difference output between CUDA and CPU is the one.

An example is like the following:

git clone http://github.com/microsoft/onnxruntime
cd onnxruntime
export CUDA_HOME=/usr/local/cuda
export CUDNN_HOME=/usr/local/cuda
export CUDACXX=/usr/local/cuda/bin/nvcc
sh build.sh  --config Release  --build_shared_lib --parallel  --use_cuda --cuda_version  11.7 --cuda_home /usr/local/cuda --cudnn_home /usr/local/cuda --build_wheel --skip_test --cmake_extra_defines onnxruntime_DEBUG_NODE_INPUTS_OUTPUTS=1
cd build/Linux/Release/dist
pip install onnxruntime_gpu-1.15.0-cp310-cp310-linux_x86_64.whl --force-reinstall
export ORT_DEBUG_NODE_IO_DUMP_SHAPE_DATA=0
export ORT_DEBUG_NODE_IO_DUMP_NODE_PLACEMENT=0
export ORT_DEBUG_NODE_IO_DUMP_INPUT_DATA=0
export ORT_DEBUG_NODE_IO_DUMP_OUTPUT_DATA=1
export ORT_DEBUG_NODE_IO_DUMP_DATA_DESTINATION=stdout
export ORT_DEBUG_NODE_IO_SNIPPET_THRESHOLD=0
python test_my_model.py > dump_cpu.txt
python test_my_model.py --use_gpu > dump_gpu.txt

@tianleiwu model is here!! thanks

https://drive.google.com/file/d/1snHpEr-ok15gfikFBnuzGMtkfwpGSFXw/view?usp=share_link @sverrejoh @mtodd @radical @tianleiwu