microsoft / Deep3DFaceReconstruction

Accurate 3D Face Reconstruction with Weakly-Supervised Learning: From Single Image to Image Set (CVPRW 2019)
MIT License
2.16k stars 441 forks source link

Unknown Error #194

Open RAJA-PARIKSHAT opened 2 years ago

RAJA-PARIKSHAT commented 2 years ago

I am getting some mysterious error while running demo.py. +-----------------------------------------------------------------------------+ | NVIDIA-SMI 495.29.05 Driver Version: 495.29.05 CUDA Version: 11.5 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |===============================+======================+======================| | 0 NVIDIA RTX A4000 On | 00000000:17:00.0 Off | Off | | 68% 86C P2 129W / 140W | 12372MiB / 16117MiB | 54% Default | | | | N/A | +-------------------------------+----------------------+----------------------+ | 1 NVIDIA RTX A4000 On | 00000000:65:00.0 Off | Off | | 41% 49C P8 15W / 140W | 3MiB / 16108MiB | 0% Default | | | | N/A | +-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=============================================================================|

We have a gpu server and I am using device = 1. CUDA_VISIBLE_DEVICES=1 python demo.py and I am getting these errors

ERROR:tensorflow:================================== Object was never used (type <class 'tensorflow.python.framework.ops.Operation'>): <tf.Operation 'assert_greater/Assert/AssertGuard/Merge' type=Merge> If you want to mark it as used call its "mark_used()" method. It was originally created here: File "demo.py", line 124, in demo() File "demo.py", line 82, in demo FaceReconstructor.Reconstruction_Block(coeff,opt) File "/mnt/NLPStorage/parikshat/Deep3DFaceReconstruction/face_decoder.py", line 63, in Reconstruction_Block render_imgs,img_mask,img_mask_crop = self.Render_block(face_shape_t,norm_r,face_color,camera_scale,f_scale,self.facemodel,opt.batch_size,opt.is_train) File "/mnt/NLPStorage/parikshat/Deep3DFaceReconstruction/face_decoder.py", line 312, in Render_block ambient_color = ambient_color) File "/mnt/NLPStorage/parikshat/Deep3DFaceReconstruction/renderer/mesh_renderer.py", line 364, in mesh_renderer camera_up) File "/mnt/NLPStorage/parikshat/Deep3DFaceReconstruction/renderer/../renderer/camera_utils.py", line 89, in look_at message='Camera matrix is degenerate because eye and center are close.') File "/home/ubuntu/anaconda3/envs/deep3d/lib/python3.6/site-packages/tensorflow/python/ops/check_ops.py", line 666, in assert_greater return control_flow_ops.Assert(condition, data, summarize=summarize) File "/home/ubuntu/anaconda3/envs/deep3d/lib/python3.6/site-packages/tensorflow/python/util/tf_should_use.py", line 189, in wrapped return _add_should_use_warning(fn(*args, **kwargs))

ERROR:tensorflow:================================== Object was never used (type <class 'tensorflow.python.framework.ops.Operation'>): <tf.Operation 'assert_greater_1/Assert/AssertGuard/Merge' type=Merge> If you want to mark it as used call its "mark_used()" method. It was originally created here: File "demo.py", line 124, in demo() File "demo.py", line 82, in demo FaceReconstructor.Reconstruction_Block(coeff,opt) File "/mnt/NLPStorage/parikshat/Deep3DFaceReconstruction/face_decoder.py", line 63, in Reconstruction_Block render_imgs,img_mask,img_mask_crop = self.Render_block(face_shape_t,norm_r,face_color,camera_scale,f_scale,self.facemodel,opt.batch_size,opt.is_train) File "/mnt/NLPStorage/parikshat/Deep3DFaceReconstruction/face_decoder.py", line 312, in Render_block ambient_color = ambient_color) File "/mnt/NLPStorage/parikshat/Deep3DFaceReconstruction/renderer/mesh_renderer.py", line 364, in mesh_renderer camera_up) File "/mnt/NLPStorage/parikshat/Deep3DFaceReconstruction/renderer/../renderer/camera_utils.py", line 97, in look_at message='Camera matrix is degenerate because up and gaze are close or' File "/home/ubuntu/anaconda3/envs/deep3d/lib/python3.6/site-packages/tensorflow/python/ops/check_ops.py", line 666, in assert_greater return control_flow_ops.Assert(condition, data, summarize=summarize) File "/home/ubuntu/anaconda3/envs/deep3d/lib/python3.6/site-packages/tensorflow/python/util/tf_should_use.py", line 189, in wrapped return _add_should_use_warning(fn(*args, **kwargs))

2022-02-09 07:13:27.482419: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 AVX512F FMA 2022-02-09 07:13:27.607719: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1432] Found device 0 with properties: name: NVIDIA RTX A4000 major: 8 minor: 6 memoryClockRate(GHz): 1.56 pciBusID: 0000:17:00.0 totalMemory: 15.74GiB freeMemory: 3.50GiB 2022-02-09 07:13:27.607743: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1511] Adding visible gpu devices: 0 2022-02-09 07:16:30.596270: I tensorflow/core/common_runtime/gpu/gpu_device.cc:982] Device interconnect StreamExecutor with strength 1 edge matrix: 2022-02-09 07:16:30.596291: I tensorflow/core/common_runtime/gpu/gpu_device.cc:988] 0 2022-02-09 07:16:30.596295: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1001] 0: N 2022-02-09 07:16:30.596389: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 3203 MB memory) -> physical GPU (device: 0, name: NVIDIA RTX A4000, pci bus id: 0000:17:00.0, compute capability: 8.6) reconstructing... 1 /mnt/NLPStorage/parikshat/Deep3DFaceReconstruction/preprocessimg.py:27: FutureWarning: rcond parameter will change to the default of machine precision times max(M, N) where M and N are the input matrix dimensions. To use the future default and silence this warning we advise to pass rcond=None, to keep using the old, explicitly pass rcond=-1. k,,, = np.linalg.lstsq(A,b) /mnt/NLPStorage/parikshat/Deep3DFaceReconstruction/preprocess_img.py:74: VisibleDeprecationWarning: Creating an ndarray from ragged nested sequences (which is a list-or-tuple of lists-or-tuples-or ndarrays with different lengths or shapes) is deprecated. If you meant to do this, you must specify 'dtype=object' when creating the ndarray trans_params = np.array([w0,h0,102.0/s,t[0],t[1]]) 2022-02-09 07:31:11.340602: E tensorflow/stream_executor/cuda/cuda_blas.cc:652] failed to run cuBLAS routine cublasSgemm_v2: CUBLAS_STATUS_EXECUTION_FAILED Traceback (most recent call last): File "/home/ubuntu/anaconda3/envs/deep3d/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1334, in _do_call return fn(*args) File "/home/ubuntu/anaconda3/envs/deep3d/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1319, in _run_fn options, feed_dict, fetch_list, target_list, run_metadata) File "/home/ubuntu/anaconda3/envs/deep3d/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1407, in _call_tf_sessionrun run_metadata) tensorflow.python.framework.errors_impl.InternalError: Blas xGEMM launch failed : a.shape=[1,35709,3], b.shape=[1,3,3], m=35709, n=3, k=3 [[{{node MatMul_3}} = BatchMatMul[T=DT_FLOAT, adj_x=false, adj_y=false, _device="/job:localhost/replica:0/task:0/device:GPU:0"](sub, transpose)]]

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "demo.py", line 124, in demo() File "demo.py", line 105, in demo face_shape,face_texture,face_color,landmarks_2d,recon_img,tri],feed_dict = {images: input_img}) File "/home/ubuntu/anaconda3/envs/deep3d/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 929, in run run_metadata_ptr) File "/home/ubuntu/anaconda3/envs/deep3d/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1152, in _run feed_dict_tensor, options, run_metadata) File "/home/ubuntu/anaconda3/envs/deep3d/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1328, in _do_run run_metadata) File "/home/ubuntu/anaconda3/envs/deep3d/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1348, in _do_call raise type(e)(node_def, op, message) tensorflow.python.framework.errors_impl.InternalError: Blas xGEMM launch failed : a.shape=[1,35709,3], b.shape=[1,3,3], m=35709, n=3, k=3 [[node MatMul_3 (defined at /mnt/NLPStorage/parikshat/Deep3DFaceReconstruction/face_decoder.py:258) = BatchMatMul[T=DT_FLOAT, adj_x=false, adj_y=false, _device="/job:localhost/replica:0/task:0/device:GPU:0"](sub, transpose)]]

Caused by op 'MatMul_3', defined at: File "demo.py", line 124, in demo() File "demo.py", line 82, in demo FaceReconstructor.Reconstruction_Block(coeff,opt) File "/mnt/NLPStorage/parikshat/Deep3DFaceReconstruction/face_decoder.py", line 53, in Reconstruction_Block face_shape_t = self.Rigid_transform_block(face_shape,rotation,translation) File "/mnt/NLPStorage/parikshat/Deep3DFaceReconstruction/face_decoder.py", line 258, in Rigid_transform_block face_shape_r = tf.matmul(face_shape,rotation) File "/home/ubuntu/anaconda3/envs/deep3d/lib/python3.6/site-packages/tensorflow/python/ops/math_ops.py", line 2019, in matmul a, b, adj_x=adjoint_a, adj_y=adjoint_b, name=name) File "/home/ubuntu/anaconda3/envs/deep3d/lib/python3.6/site-packages/tensorflow/python/ops/gen_math_ops.py", line 1245, in batch_mat_mul "BatchMatMul", x=x, y=y, adj_x=adj_x, adj_y=adj_y, name=name) File "/home/ubuntu/anaconda3/envs/deep3d/lib/python3.6/site-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper op_def=op_def) File "/home/ubuntu/anaconda3/envs/deep3d/lib/python3.6/site-packages/tensorflow/python/util/deprecation.py", line 488, in new_func return func(*args, **kwargs) File "/home/ubuntu/anaconda3/envs/deep3d/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 3274, in create_op op_def=op_def) File "/home/ubuntu/anaconda3/envs/deep3d/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 1770, in init self._traceback = tf_stack.extract_stack()

InternalError (see above for traceback): Blas xGEMM launch failed : a.shape=[1,35709,3], b.shape=[1,3,3], m=35709, n=3, k=3 [[node MatMul_3 (defined at /mnt/NLPStorage/parikshat/Deep3DFaceReconstruction/face_decoder.py:258) = BatchMatMul[T=DT_FLOAT, adj_x=false, adj_y=false, _device="/job:localhost/replica:0/task:0/device:GPU:0"](sub, transpose)]]

I cannot understand what's wrong. Can somebody help me out. @YuDeng

nanak0k commented 1 year ago

I have the same error. Have you solved it?

lingleong981130 commented 1 year ago

Have you solve it? I have the same issue

axbing commented 10 months ago

same error here. I think it's about mismatch tensorflow and CUDA sdk.