microsoft / tensorflow-directml

Fork of TensorFlow accelerated by DirectML
Apache License 2.0
457 stars 32 forks source link

session run crashed when runing on nvidia gpu #402

Closed argman closed 1 year ago

argman commented 1 year ago

System Information

Host System
--------------------------------------------------------------------------------
Windows 10 Version  : Windows 10 专业版 64-bit (10.0, Build 19044) (19041.vb_release.191206-1406)
Processor           : Intel(R) Core(TM) i7-8700 CPU @ 3.20GHz (12 CPUs), ~3.2GHz
Memory              : 40960MB RAM
DirectX Version     : DirectX 12

Python Environment
--------------------------------------------------------------------------------
Python Version      : 3.6.4
TensorFlow-DirectML : 1.15.8

DirectX Device
--------------------------------------------------------------------------------
Description         : NVIDIA GeForce GTX 1060 6GB
Manufacturer        : NVIDIA
Chip Type           : NVIDIA GeForce GTX 1060 6GB
Dedicated Memory    : 6052 MB
Driver Version      : 30.0.15.1252
Driver Model        : WDDM 2.7
Driver Date         : 2022/4/15 8:00:00
Feature Levels      : 12_1,12_0,11_1,11_0,10_1,10_0,9_3,9_2,9_1

DirectX Device
--------------------------------------------------------------------------------
Description         : NVIDIA GeForce GTX 1060 6GB
Manufacturer        : NVIDIA
Chip Type           : NVIDIA GeForce GTX 1060 6GB
Dedicated Memory    : 6052 MB
Driver Version      : 30.0.15.1252
Driver Model        : WDDM 2.7
Driver Date         : 2022/4/15 8:00:00
Feature Levels      : 12_1,12_0,11_1,11_0,10_1,10_0,9_3,9_2,9_1

DirectX Device
--------------------------------------------------------------------------------
Description         : Intel(R) UHD Graphics 630
Manufacturer        : Intel Corporation
Chip Type           : Intel(R) UHD Graphics Family
Dedicated Memory    : 128 MB
Driver Version      : 31.0.101.2111
Driver Model        : WDDM 2.7
Driver Date         : 2022/7/19 8:00:00
Feature Levels      : 12_1,12_0,11_1,11_0,10_1,10_0,9_3,9_2,9_1

DirectX Device
--------------------------------------------------------------------------------
Description         : Citrix Indirect Display Adapter
Manufacturer        : Citrix Systems Inc.
Chip Type           : Unknown
Dedicated Memory    : 6052 MB
Driver Version      : 12.40.44.247
Driver Model        : WDDM 1.3
Driver Date         : 2019/1/23 8:00:00
Feature Levels      : 12_1,12_0,11_1,11_0,10_1,10_0,9_3,9_2,9_1

Repro Details

Describe the current behavior sess.run crashed when run a pb model when running on a nvidia gpu, but when i switched to a amd gpu, it runs successfully Describe the expected behavior

Code to reproduce the issue Provide a reproducible test case that is the bare minimum necessary to generate the problem.

load frozen model, then run with session

Other info / logs Include any logs or source code that would be helpful to diagnose the problem. If including tracebacks, please include the full traceback. Large logs and files should be attached.

2023-02-03 10:59:10.842786: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 2023-02-03 10:59:10.846478: I tensorflow/stream_executor/platform/default/dso_loader.cc:97] Successfully opened dynamic library directml.d6f03b303ac3c4f2eeb8ca631688c9757b361310.dll 2023-02-03 10:59:10.847011: I tensorflow/stream_executor/platform/default/dso_loader.cc:97] Successfully opened dynamic library dxgi.dll 2023-02-03 10:59:10.849614: I tensorflow/stream_executor/platform/default/dso_loader.cc:97] Successfully opened dynamic library d3d12.dll 2023-02-03 10:59:11.188342: I tensorflow/core/common_runtime/dml/dml_device_cache.cc:250] DirectML device enumeration: found 2 compatible adapters. 2023-02-03 10:59:11.188513: I tensorflow/core/common_runtime/dml/dml_device_cache.cc:186] DirectML: creating device on adapter 0 (NVIDIA GeForce GTX 1060 6GB) 2023-02-03 10:59:11.285401: I tensorflow/stream_executor/platform/default/dso_loader.cc:97] Successfully opened dynamic library Kernel32.dll 2023-02-03 10:59:11.299358: I tensorflow/core/common_runtime/dml/dml_device_cache.cc:186] DirectML: creating device on adapter 1 (Intel(R) UHD Graphics 630) WARNING:tensorflow:From e:/PycharmProjects/accumodel/demo/convert_model.py:35: The name tf.RunOptions is deprecated. Please use tf.compat.v1.RunOptions instead. 2023-02-03 10:59:12.501597: I tensorflow/core/profiler/lib/profiler_session.cc:205] Profiler session started.

argman commented 1 year ago

I close this to discuss in tensorflow-directml-plugin repo