microsoft / onnxruntime

ONNX Runtime: cross-platform, high performance ML inferencing and training accelerator
https://onnxruntime.ai
MIT License
14.41k stars 2.89k forks source link

GPU bug with Unpooling layer and large size inputs #3113

Open belgraviton opened 4 years ago

belgraviton commented 4 years ago

Describe the bug Large input size SegNet like model with unpooling layer (return_indices= True) fails to run on GPU.

Urgency Using unpooling layer on GPU with large input size is blocked by this issue.

System information

To Reproduce Link to source and models. Compile test.cpp with ionnx class interface to onnxruntime and run it with command: “test model_name.onnx”

Expected behavior Should successfully run.

Additional context

All tests are carried out with C++ (CPU and GPU) and python interface (CPU). “Upsample” model converted directly from pytorch. “Unpooling” models were created in python manually (example link).

Input size Upscale layer CPU, python and C++ GPU, C++
128x64x1 Unpooling OK OK
512x256x1 Unpooling OK FAIL
512x256x1 Upsample OK OK

I get “Process finished with exit code 135 (interrupted by signal 7: SIGEMT)” on Ubuntu 16.04 with onnxruntime built from source.

I get “Ort::Exception at memory location 0x000000A007AFBB50” error in release mode and “Exception thrown: read access violation. Y_data was 0x111011101110111” in debug mode for similar models on Windows 10 with onnxruntime prebuild v1.1.

belgraviton commented 4 years ago

There are 2 main functions in ionnx interface: initialization and run.

Zero filled input is used to run model in INITIALIZATION function:

This behavior is close to issue #2700

Real or zero input run in “RUN” function fails in GPU mode with large input size.

hariharans29 commented 4 years ago

Can you please try this with the 1.2 release please ? I ll take a look if it still occurs with the latest release.

belgraviton commented 4 years ago

I have checked models with the 1.2 release build. Results are the SAME. Unpooling model with 512x256 input is FAILED to run on GPU with C++ interface.

belgraviton commented 4 years ago

@hariharans29 Are any ideas in bug reasons?

hariharans29 commented 4 years ago

Will take a look at this next week. Sorry for the delay and thanks for confirming.

belgraviton commented 4 years ago

@hariharans29 Had you a chance to look on the bug?

stale[bot] commented 4 years ago

This issue has been automatically marked as stale due to inactivity and will be closed in 7 days if no further activity occurs. If further support is needed, please provide an update and/or more details.

stale[bot] commented 4 years ago

This issue has been automatically closed due to inactivity. Please reactivate if further support is needed.

hariharans29 commented 4 years ago

Sorry - I never dis get a chance to look at this. I ll try to do so, keeping this open.

belgraviton commented 4 years ago

Thank you

stale[bot] commented 3 years ago

This issue has been automatically marked as stale due to inactivity and will be closed in 7 days if no further activity occurs. If further support is needed, please provide an update and/or more details.

stale[bot] commented 2 years ago

This issue has been automatically marked as stale due to inactivity and will be closed in 7 days if no further activity occurs. If further support is needed, please provide an update and/or more details.