yasaminjafarian / HDNet_TikTok

MIT License
337 stars 36 forks source link

GPU not accepted? #31

Closed chrisdottel closed 2 years ago

chrisdottel commented 2 years ago

Not sure what is going on here. I am running the new NVIDIA 3080 TI. It says my GPU is not supported which I have never had happen to me before even on older codebases. :/ ` Traceback (most recent call last): File "/home/killab/anaconda3/envs/HDNet/lib/python3.7/site-packages/tensorflow/python/client/session.py", line 1356, in _do_call return fn(*args) File "/home/killab/anaconda3/envs/HDNet/lib/python3.7/site-packages/tensorflow/python/client/session.py", line 1339, in _run_fn self._extend_graph() File "/home/killab/anaconda3/envs/HDNet/lib/python3.7/site-packages/tensorflow/python/client/session.py", line 1374, in _extend_graph tf_session.ExtendSession(self._session) tensorflow.python.framework.errors_impl.InvalidArgumentError: Cannot assign a device for operation hourglass_stack_fused_depth_prediction/conv1/weights/Initializer/truncated_normal/TruncatedNormal: Could not satisfy explicit device specification '' because the node {{colocation_node hourglass_stack_fused_depth_prediction/conv1/weights/Initializer/truncated_normal/TruncatedNormal}} was colocated with a group of nodes that required incompatible device '/device:GPU:0'. All available devices [/job:localhost/replica:0/task:0/device:CPU:0, /job:localhost/replica:0/task:0/device:XLA_CPU:0]. Colocation Debug Info: Colocation group had the following types and supported devices: Root Member(assigned_device_nameindex=-1 requested_devicename='/device:GPU:0' assigned_devicename='' resource_devicename='/device:GPU:0' supported_devicetypes=[CPU] possibledevices=[] L2Loss: CPU XLA_CPU Assign: CPU VariableV2: CPU Identity: CPU XLA_CPU Add: CPU XLA_CPU Mul: CPU XLA_CPU TruncatedNormal: CPU XLA_CPU Const: CPU XLA_CPU

Colocation members, user-requested devices, and framework assigned devices, if any: hourglass_stack_fused_depth_prediction/conv1/weights/Initializer/truncated_normal/shape (Const) hourglass_stack_fused_depth_prediction/conv1/weights/Initializer/truncated_normal/mean (Const) hourglass_stack_fused_depth_prediction/conv1/weights/Initializer/truncated_normal/stddev (Const) hourglass_stack_fused_depth_prediction/conv1/weights/Initializer/truncated_normal/TruncatedNormal (TruncatedNormal) hourglass_stack_fused_depth_prediction/conv1/weights/Initializer/truncated_normal/mul (Mul) hourglass_stack_fused_depth_prediction/conv1/weights/Initializer/truncated_normal (Add) hourglass_stack_fused_depth_prediction/conv1/weights (VariableV2) /device:GPU:0 hourglass_stack_fused_depth_prediction/conv1/weights/Assign (Assign) /device:GPU:0 hourglass_stack_fused_depth_prediction/conv1/weights/read (Identity) /device:GPU:0 hourglass_stack_fused_depth_prediction/conv1/weights/Regularizer/l2_regularizer/scale (Const) /device:GPU:0 hourglass_stack_fused_depth_prediction/conv1/weights/Regularizer/l2_regularizer/L2Loss (L2Loss) /device:GPU:0 hourglass_stack_fused_depth_prediction/conv1/weights/Regularizer/l2_regularizer (Mul) /device:GPU:0

 [[{{node hourglass_stack_fused_depth_prediction/conv1/weights/Initializer/truncated_normal/TruncatedNormal}}]]

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "HDNet_Inference.py", line 64, in tf.global_variables_initializer().run() File "/home/killab/anaconda3/envs/HDNet/lib/python3.7/site-packages/tensorflow/python/framework/ops.py", line 2679, in run _run_using_default_session(self, feed_dict, self.graph, session) File "/home/killab/anaconda3/envs/HDNet/lib/python3.7/site-packages/tensorflow/python/framework/ops.py", line 5614, in _run_using_default_session session.run(operation, feed_dict) File "/home/killab/anaconda3/envs/HDNet/lib/python3.7/site-packages/tensorflow/python/client/session.py", line 950, in run run_metadata_ptr) File "/home/killab/anaconda3/envs/HDNet/lib/python3.7/site-packages/tensorflow/python/client/session.py", line 1173, in _run feed_dict_tensor, options, run_metadata) File "/home/killab/anaconda3/envs/HDNet/lib/python3.7/site-packages/tensorflow/python/client/session.py", line 1350, in _do_run run_metadata) File "/home/killab/anaconda3/envs/HDNet/lib/python3.7/site-packages/tensorflow/python/client/session.py", line 1370, in _do_call raise type(e)(node_def, op, message) tensorflow.python.framework.errors_impl.InvalidArgumentError: Cannot assign a device for operation hourglass_stack_fused_depth_prediction/conv1/weights/Initializer/truncated_normal/TruncatedNormal: Could not satisfy explicit device specification '' because the node node hourglass_stack_fused_depth_prediction/conv1/weights/Initializer/truncated_normal/TruncatedNormal (defined at /home/killab/ml/models/ShapeGeneration/HDNet_TikTok/hourglass_net_depth.py:28) placed on device Device assignments active during op 'hourglass_stack_fused_depth_prediction/conv1/weights/Initializer/truncated_normal/TruncatedNormal' creation: with tf.device(None): </home/killab/anaconda3/envs/HDNet/lib/python3.7/site-packages/tensorflow/python/ops/variables.py:1696> with tf.device(/gpu:0): </home/killab/ml/models/ShapeGeneration/HDNet_TikTok/hourglass_net_depth.py:26> was colocated with a group of nodes that required incompatible device '/device:GPU:0'. All available devices [/job:localhost/replica:0/task:0/device:CPU:0, /job:localhost/replica:0/task:0/device:XLA_CPU:0]. Colocation Debug Info: Colocation group had the following types and supported devices: Root Member(assigned_device_nameindex=-1 requested_devicename='/device:GPU:0' assigned_devicename='' resource_devicename='/device:GPU:0' supported_devicetypes=[CPU] possibledevices=[] L2Loss: CPU XLA_CPU Assign: CPU VariableV2: CPU Identity: CPU XLA_CPU Add: CPU XLA_CPU Mul: CPU XLA_CPU TruncatedNormal: CPU XLA_CPU Const: CPU XLA_CPU

Colocation members, user-requested devices, and framework assigned devices, if any: hourglass_stack_fused_depth_prediction/conv1/weights/Initializer/truncated_normal/shape (Const) hourglass_stack_fused_depth_prediction/conv1/weights/Initializer/truncated_normal/mean (Const) hourglass_stack_fused_depth_prediction/conv1/weights/Initializer/truncated_normal/stddev (Const) hourglass_stack_fused_depth_prediction/conv1/weights/Initializer/truncated_normal/TruncatedNormal (TruncatedNormal) hourglass_stack_fused_depth_prediction/conv1/weights/Initializer/truncated_normal/mul (Mul) hourglass_stack_fused_depth_prediction/conv1/weights/Initializer/truncated_normal (Add) hourglass_stack_fused_depth_prediction/conv1/weights (VariableV2) /device:GPU:0 hourglass_stack_fused_depth_prediction/conv1/weights/Assign (Assign) /device:GPU:0 hourglass_stack_fused_depth_prediction/conv1/weights/read (Identity) /device:GPU:0 hourglass_stack_fused_depth_prediction/conv1/weights/Regularizer/l2_regularizer/scale (Const) /device:GPU:0 hourglass_stack_fused_depth_prediction/conv1/weights/Regularizer/l2_regularizer/L2Loss (L2Loss) /device:GPU:0 hourglass_stack_fused_depth_prediction/conv1/weights/Regularizer/l2_regularizer (Mul) /device:GPU:0

 [[node hourglass_stack_fused_depth_prediction/conv1/weights/Initializer/truncated_normal/TruncatedNormal (defined at /home/killab/ml/models/ShapeGeneration/HDNet_TikTok/hourglass_net_depth.py:28) ]]Additional information about colocations:No node-device colocations were active during op 'hourglass_stack_fused_depth_prediction/conv1/weights/Initializer/truncated_normal/TruncatedNormal' creation.

Device assignments active during op 'hourglass_stack_fused_depth_prediction/conv1/weights/Initializer/truncated_normal/TruncatedNormal' creation: with tf.device(None): </home/killab/anaconda3/envs/HDNet/lib/python3.7/site-packages/tensorflow/python/ops/variables.py:1696> with tf.device(/gpu:0): </home/killab/ml/models/ShapeGeneration/HDNet_TikTok/hourglass_net_depth.py:26>

Original stack trace for 'hourglass_stack_fused_depth_prediction/conv1/weights/Initializer/truncated_normal/TruncatedNormal': File "HDNet_Inference.py", line 57, in out2_1 = hourglass_refinement(x1,True) File "/home/killab/ml/models/ShapeGeneration/HDNet_TikTok/hourglass_net_depth.py", line 119, in hourglass_refinement out0_d = hourglass_stack_no_incep(netIN) File "/home/killab/ml/models/ShapeGeneration/HDNet_TikTok/hourglass_net_depth.py", line 74, in hourglass_stack_no_incep c0 = conv_layer(stack_in,layer_name('conv'),KER_SZ,NUM_CH[0],bn=bn,training=training) File "/home/killab/ml/models/ShapeGeneration/HDNet_TikTok/hourglass_net_depth.py", line 34, in conv_layer kernel = variable('weights', [kernel_size, kernel_size, input_channels, output_channels], initializer, regularizer=tf.contrib.layers.l2_regularizer(0.0005)) File "/home/killab/ml/models/ShapeGeneration/HDNet_TikTok/hourglass_net_depth.py", line 28, in variable return tf.get_variable(name, shape, initializer=initializer, regularizer=regularizer, dtype=tf.float32, trainable=True) File "/home/killab/anaconda3/envs/HDNet/lib/python3.7/site-packages/tensorflow/python/ops/variable_scope.py", line 1496, in get_variable aggregation=aggregation) File "/home/killab/anaconda3/envs/HDNet/lib/python3.7/site-packages/tensorflow/python/ops/variable_scope.py", line 1239, in get_variable aggregation=aggregation) File "/home/killab/anaconda3/envs/HDNet/lib/python3.7/site-packages/tensorflow/python/ops/variable_scope.py", line 562, in get_variable aggregation=aggregation) File "/home/killab/anaconda3/envs/HDNet/lib/python3.7/site-packages/tensorflow/python/ops/variable_scope.py", line 514, in _true_getter aggregation=aggregation) File "/home/killab/anaconda3/envs/HDNet/lib/python3.7/site-packages/tensorflow/python/ops/variable_scope.py", line 929, in _get_single_variable aggregation=aggregation) File "/home/killab/anaconda3/envs/HDNet/lib/python3.7/site-packages/tensorflow/python/ops/variables.py", line 259, in call return cls._variable_v1_call(*args, kwargs) File "/home/killab/anaconda3/envs/HDNet/lib/python3.7/site-packages/tensorflow/python/ops/variables.py", line 220, in _variable_v1_call shape=shape) File "/home/killab/anaconda3/envs/HDNet/lib/python3.7/site-packages/tensorflow/python/ops/variables.py", line 198, in previous_getter = lambda kwargs: default_variable_creator(None, kwargs) File "/home/killab/anaconda3/envs/HDNet/lib/python3.7/site-packages/tensorflow/python/ops/variable_scope.py", line 2511, in default_variable_creator shape=shape) File "/home/killab/anaconda3/envs/HDNet/lib/python3.7/site-packages/tensorflow/python/ops/variables.py", line 263, in call return super(VariableMetaclass, cls).call(*args, *kwargs) File "/home/killab/anaconda3/envs/HDNet/lib/python3.7/site-packages/tensorflow/python/ops/variables.py", line 1568, in init shape=shape) File "/home/killab/anaconda3/envs/HDNet/lib/python3.7/site-packages/tensorflow/python/ops/variables.py", line 1698, in _init_from_args initial_value(), name="initial_value", dtype=dtype) File "/home/killab/anaconda3/envs/HDNet/lib/python3.7/site-packages/tensorflow/python/ops/variable_scope.py", line 901, in partition_info=partition_info) File "/home/killab/anaconda3/envs/HDNet/lib/python3.7/site-packages/tensorflow/contrib/layers/python/layers/initializers.py", line 150, in _initializer seed=seed) File "/home/killab/anaconda3/envs/HDNet/lib/python3.7/site-packages/tensorflow/python/ops/random_ops.py", line 178, in truncated_normal shape_tensor, dtype, seed=seed1, seed2=seed2) File "/home/killab/anaconda3/envs/HDNet/lib/python3.7/site-packages/tensorflow/python/ops/gen_random_ops.py", line 1013, in truncated_normal name=name) File "/home/killab/anaconda3/envs/HDNet/lib/python3.7/site-packages/tensorflow/python/framework/op_def_library.py", line 788, in _apply_op_helper op_def=op_def) File "/home/killab/anaconda3/envs/HDNet/lib/python3.7/site-packages/tensorflow/python/util/deprecation.py", line 507, in new_func return func(args, kwargs) File "/home/killab/anaconda3/envs/HDNet/lib/python3.7/site-packages/tensorflow/python/framework/ops.py", line 3616, in create_op op_def=op_def) File "/home/killab/anaconda3/envs/HDNet/lib/python3.7/site-packages/tensorflow/python/framework/ops.py", line 2005, in init self._traceback = tf_stack.extract_stack()

`

JyChen9811 commented 2 years ago

Hi, Can you solve this problem?

JyChen9811 commented 2 years ago

Hi, this problem could be addressed by checking your cuda version. tensorflow 1.14-cuda 10

yasaminjafarian commented 2 years ago

Hi. Yes, this code is tested with tensorflow-gpu 1.14.0, Python 3.7.4, CUDA 10 (version 10.0.130) and cuDNN 7 (version 7.4.2)