tencent-ailab / hifi3dface

Code and data for our paper "High-Fidelity 3D Digital Human Creation from RGB-D Selfies".
Other
756 stars 153 forks source link

CPU机器运行报错:InvalidArgumentError (see above for traceback): Default MaxPoolingOp only supports NHWC on device type CPU #10

Open tang1485 opened 3 years ago

tang1485 commented 3 years ago

运行bash run_opt_rgb.sh,报了InvalidArgumentError (see above for traceback): Default MaxPoolingOp only supports NHWC on device type CPU的错误。使用的CPU机器,安装的是cpu版本tensorflow

prepare datas
start MTCNN
MTCNN detect
hello
2020-11-05 15:07:19.810071: I tensorflow/core/platform/cpu_feature_guard.cc:140] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
=========================
has run MTCNN: 1 / 1
start detect 86pt 3D lmk
hello
has run 86pt lmk: 1 / 1
start detect 68pt 2D lmk
hello
WARNING:tensorflow:From /root/anaconda3/envs/py36/lib/python3.6/site-packages/tensorflow/python/util/tf_should_use.py:118: initialize_all_variables (from tensorflow.python.ops.variables) is deprecated and will be removed after 2017-03-02.
Instructions for updating:
Use `tf.global_variables_initializer` instead.
W1105 15:07:21.785537 140196717344576 tf_logging.py:126] From /root/anaconda3/envs/py36/lib/python3.6/site-packages/tensorflow/python/util/tf_should_use.py:118: initialize_all_variables (from tensorflow.python.ops.variables) is deprecated and will be removed after 2017-03-02.
Instructions for updating:
Use `tf.global_variables_initializer` instead.
load lm: /data/hifi3dface/hifi3dface/test_data/RGB/test1/single_img//prepare_rgb/lmk_3D_86pts_ori.txt
2020-11-05 15:07:24.326655: E tensorflow/core/common_runtime/executor.cc:660] Executor failed to create kernel. Invalid argument: Default MaxPoolingOp only supports NHWC on device type CPU
         [[Node: max_pool = MaxPool[T=DT_FLOAT, data_format="NCHW", ksize=[1, 1, 2, 2], padding="VALID", strides=[1, 1, 2, 2], _device="/job:localhost/replica:0/task:0/device:CPU:0"](Relu_23)]]
Traceback (most recent call last):
  File "/root/anaconda3/envs/py36/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1322, in _do_call
    return fn(*args)
  File "/root/anaconda3/envs/py36/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1307, in _run_fn
    options, feed_dict, fetch_list, target_list, run_metadata)
  File "/root/anaconda3/envs/py36/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1409, in _call_tf_sessionrun
    run_metadata)
tensorflow.python.framework.errors_impl.InvalidArgumentError: Default MaxPoolingOp only supports NHWC on device type CPU
         [[Node: max_pool = MaxPool[T=DT_FLOAT, data_format="NCHW", ksize=[1, 1, 2, 2], padding="VALID", strides=[1, 1, 2, 2], _device="/job:localhost/replica:0/task:0/device:CPU:0"](Relu_23)]]

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "run_data_preparation.py", line 342, in <module>
    app.run(main)
  File "/root/anaconda3/envs/py36/lib/python3.6/site-packages/absl/app.py", line 300, in run
    _run_main(main, args)
  File "/root/anaconda3/envs/py36/lib/python3.6/site-packages/absl/app.py", line 251, in _run_main
    sys.exit(main(argv))
  File "run_data_preparation.py", line 321, in main
    prepare_test_data_RGB(FLAGS.img_dir, FLAGS.out_dir)
  File "run_data_preparation.py", line 217, in prepare_test_data_RGB
    pb_path, img_dir, lmk3D_ori_txt_path, lmk2D_ori_txt_path
  File "run_data_preparation.py", line 84, in detect_2Dlmk_all_imgs
    np.array([lmk3D]), np.array([img]), sess
  File "/data/hifi3dface/hifi3dface/data_prepare/detect_2D_landmark.py", line 226, in detect_2Dlmk68
    heatmap = sess.run(outputs, {inputs: test_img})
  File "/root/anaconda3/envs/py36/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 900, in run
    run_metadata_ptr)
  File "/root/anaconda3/envs/py36/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1135, in _run
    feed_dict_tensor, options, run_metadata)
  File "/root/anaconda3/envs/py36/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1316, in _do_run
    run_metadata)
  File "/root/anaconda3/envs/py36/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1335, in _do_call
    raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InvalidArgumentError: Default MaxPoolingOp only supports NHWC on device type CPU
         [[Node: max_pool = MaxPool[T=DT_FLOAT, data_format="NCHW", ksize=[1, 1, 2, 2], padding="VALID", strides=[1, 1, 2, 2], _device="/job:localhost/replica:0/task:0/device:CPU:0"](Relu_23)]]

Caused by op 'max_pool', defined at:
  File "run_data_preparation.py", line 342, in <module>
    app.run(main)
  File "/root/anaconda3/envs/py36/lib/python3.6/site-packages/absl/app.py", line 300, in run
    _run_main(main, args)
  File "/root/anaconda3/envs/py36/lib/python3.6/site-packages/absl/app.py", line 251, in _run_main
    sys.exit(main(argv))
  File "run_data_preparation.py", line 321, in main
    prepare_test_data_RGB(FLAGS.img_dir, FLAGS.out_dir)
  File "run_data_preparation.py", line 217, in prepare_test_data_RGB
    pb_path, img_dir, lmk3D_ori_txt_path, lmk2D_ori_txt_path
  File "run_data_preparation.py", line 70, in detect_2Dlmk_all_imgs
    tf.import_graph_def(graph_def, name="")
  File "/root/anaconda3/envs/py36/lib/python3.6/site-packages/tensorflow/python/util/deprecation.py", line 432, in new_func
    return func(*args, **kwargs)
  File "/root/anaconda3/envs/py36/lib/python3.6/site-packages/tensorflow/python/framework/importer.py", line 513, in import_graph_def
    _ProcessNewOps(graph)
  File "/root/anaconda3/envs/py36/lib/python3.6/site-packages/tensorflow/python/framework/importer.py", line 303, in _ProcessNewOps
    for new_op in graph._add_new_tf_operations(compute_devices=False):  # pylint: disable=protected-access
  File "/root/anaconda3/envs/py36/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 3540, in _add_new_tf_operations
    for c_op in c_api_util.new_tf_operations(self)
  File "/root/anaconda3/envs/py36/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 3540, in <listcomp>
    for c_op in c_api_util.new_tf_operations(self)
  File "/root/anaconda3/envs/py36/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 3428, in _create_op_from_tf_operation
    ret = Operation(c_op, self)
  File "/root/anaconda3/envs/py36/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 1718, in __init__
    self._traceback = self._graph._extract_stack()  # pylint: disable=protected-access

InvalidArgumentError (see above for traceback): Default MaxPoolingOp only supports NHWC on device type CPU
         [[Node: max_pool = MaxPool[T=DT_FLOAT, data_format="NCHW", ksize=[1, 1, 2, 2], padding="VALID", strides=[1, 1, 2, 2], _device="/job:localhost/replica:0/task:0/device:CPU:0"](Relu_23)]]

data prepare failed
cyj907 commented 3 years ago

运行bash run_opt_rgb.sh,报了InvalidArgumentError (see above for traceback): Default MaxPoolingOp only supports NHWC on device type CPU的错误。使用的CPU机器,安装的是cpu版本tensorflow

prepare datas
start MTCNN
MTCNN detect
hello
2020-11-05 15:07:19.810071: I tensorflow/core/platform/cpu_feature_guard.cc:140] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
=========================
has run MTCNN: 1 / 1
start detect 86pt 3D lmk
hello
has run 86pt lmk: 1 / 1
start detect 68pt 2D lmk
hello
WARNING:tensorflow:From /root/anaconda3/envs/py36/lib/python3.6/site-packages/tensorflow/python/util/tf_should_use.py:118: initialize_all_variables (from tensorflow.python.ops.variables) is deprecated and will be removed after 2017-03-02.
Instructions for updating:
Use `tf.global_variables_initializer` instead.
W1105 15:07:21.785537 140196717344576 tf_logging.py:126] From /root/anaconda3/envs/py36/lib/python3.6/site-packages/tensorflow/python/util/tf_should_use.py:118: initialize_all_variables (from tensorflow.python.ops.variables) is deprecated and will be removed after 2017-03-02.
Instructions for updating:
Use `tf.global_variables_initializer` instead.
load lm: /data/hifi3dface/hifi3dface/test_data/RGB/test1/single_img//prepare_rgb/lmk_3D_86pts_ori.txt
2020-11-05 15:07:24.326655: E tensorflow/core/common_runtime/executor.cc:660] Executor failed to create kernel. Invalid argument: Default MaxPoolingOp only supports NHWC on device type CPU
         [[Node: max_pool = MaxPool[T=DT_FLOAT, data_format="NCHW", ksize=[1, 1, 2, 2], padding="VALID", strides=[1, 1, 2, 2], _device="/job:localhost/replica:0/task:0/device:CPU:0"](Relu_23)]]
Traceback (most recent call last):
  File "/root/anaconda3/envs/py36/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1322, in _do_call
    return fn(*args)
  File "/root/anaconda3/envs/py36/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1307, in _run_fn
    options, feed_dict, fetch_list, target_list, run_metadata)
  File "/root/anaconda3/envs/py36/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1409, in _call_tf_sessionrun
    run_metadata)
tensorflow.python.framework.errors_impl.InvalidArgumentError: Default MaxPoolingOp only supports NHWC on device type CPU
         [[Node: max_pool = MaxPool[T=DT_FLOAT, data_format="NCHW", ksize=[1, 1, 2, 2], padding="VALID", strides=[1, 1, 2, 2], _device="/job:localhost/replica:0/task:0/device:CPU:0"](Relu_23)]]

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "run_data_preparation.py", line 342, in <module>
    app.run(main)
  File "/root/anaconda3/envs/py36/lib/python3.6/site-packages/absl/app.py", line 300, in run
    _run_main(main, args)
  File "/root/anaconda3/envs/py36/lib/python3.6/site-packages/absl/app.py", line 251, in _run_main
    sys.exit(main(argv))
  File "run_data_preparation.py", line 321, in main
    prepare_test_data_RGB(FLAGS.img_dir, FLAGS.out_dir)
  File "run_data_preparation.py", line 217, in prepare_test_data_RGB
    pb_path, img_dir, lmk3D_ori_txt_path, lmk2D_ori_txt_path
  File "run_data_preparation.py", line 84, in detect_2Dlmk_all_imgs
    np.array([lmk3D]), np.array([img]), sess
  File "/data/hifi3dface/hifi3dface/data_prepare/detect_2D_landmark.py", line 226, in detect_2Dlmk68
    heatmap = sess.run(outputs, {inputs: test_img})
  File "/root/anaconda3/envs/py36/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 900, in run
    run_metadata_ptr)
  File "/root/anaconda3/envs/py36/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1135, in _run
    feed_dict_tensor, options, run_metadata)
  File "/root/anaconda3/envs/py36/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1316, in _do_run
    run_metadata)
  File "/root/anaconda3/envs/py36/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1335, in _do_call
    raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InvalidArgumentError: Default MaxPoolingOp only supports NHWC on device type CPU
         [[Node: max_pool = MaxPool[T=DT_FLOAT, data_format="NCHW", ksize=[1, 1, 2, 2], padding="VALID", strides=[1, 1, 2, 2], _device="/job:localhost/replica:0/task:0/device:CPU:0"](Relu_23)]]

Caused by op 'max_pool', defined at:
  File "run_data_preparation.py", line 342, in <module>
    app.run(main)
  File "/root/anaconda3/envs/py36/lib/python3.6/site-packages/absl/app.py", line 300, in run
    _run_main(main, args)
  File "/root/anaconda3/envs/py36/lib/python3.6/site-packages/absl/app.py", line 251, in _run_main
    sys.exit(main(argv))
  File "run_data_preparation.py", line 321, in main
    prepare_test_data_RGB(FLAGS.img_dir, FLAGS.out_dir)
  File "run_data_preparation.py", line 217, in prepare_test_data_RGB
    pb_path, img_dir, lmk3D_ori_txt_path, lmk2D_ori_txt_path
  File "run_data_preparation.py", line 70, in detect_2Dlmk_all_imgs
    tf.import_graph_def(graph_def, name="")
  File "/root/anaconda3/envs/py36/lib/python3.6/site-packages/tensorflow/python/util/deprecation.py", line 432, in new_func
    return func(*args, **kwargs)
  File "/root/anaconda3/envs/py36/lib/python3.6/site-packages/tensorflow/python/framework/importer.py", line 513, in import_graph_def
    _ProcessNewOps(graph)
  File "/root/anaconda3/envs/py36/lib/python3.6/site-packages/tensorflow/python/framework/importer.py", line 303, in _ProcessNewOps
    for new_op in graph._add_new_tf_operations(compute_devices=False):  # pylint: disable=protected-access
  File "/root/anaconda3/envs/py36/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 3540, in _add_new_tf_operations
    for c_op in c_api_util.new_tf_operations(self)
  File "/root/anaconda3/envs/py36/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 3540, in <listcomp>
    for c_op in c_api_util.new_tf_operations(self)
  File "/root/anaconda3/envs/py36/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 3428, in _create_op_from_tf_operation
    ret = Operation(c_op, self)
  File "/root/anaconda3/envs/py36/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 1718, in __init__
    self._traceback = self._graph._extract_stack()  # pylint: disable=protected-access

InvalidArgumentError (see above for traceback): Default MaxPoolingOp only supports NHWC on device type CPU
         [[Node: max_pool = MaxPool[T=DT_FLOAT, data_format="NCHW", ksize=[1, 1, 2, 2], padding="VALID", strides=[1, 1, 2, 2], _device="/job:localhost/replica:0/task:0/device:CPU:0"](Relu_23)]]

data prepare failed

我们只在GPU上面测试过,CPU应该是不支持的,毕竟栅格化的部分是cuda。如果想改成CPU跑,你需要要把所有不支持CPU的操作都重新写一下。

JacksonL1 commented 3 years ago

将detect_2Dlmk_all_imgs, face_seg两个函数中的网络,修改为调用GPU就可以了

qianxinchun commented 3 years ago

将detect_2Dlmk_all_imgs, face_seg两个函数中的网络,修改为调用GPU就可以了

这样还是不能成功啊,请问您这边有完全在CPU上跑起来吗?

JacksonL1 commented 3 years ago

将detect_2Dlmk_all_imgs, face_seg两个函数中的网络,修改为调用GPU就可以了

这样还是不能成功啊,请问您这边有完全在CPU上跑起来吗?

CPU上没有跑成功,正如cy907回复的一样,需要重写所以修改成gpu了。具体修改位置,明天帮你看一下

JacksonL1 commented 3 years ago

将detect_2Dlmk_all_imgs, face_seg两个函数中的网络,修改为调用GPU就可以了

这样还是不能成功啊,请问您这边有完全在CPU上跑起来吗?

在上面两个函数中,就更仅更改了以下内容,我这里就可以运行了.如果还是不行,可以把error信息贴一下 with tf.Graph().as_default(): 改为 with tf.Graph().as_default(), tf.device('/device:XLA_GPU:0'):

arpu commented 3 years ago

Hello,

i think i have the same problem ( i only use CPU on server)

020-11-22 02:28:16.446224: E tensorflow/core/common_runtime/executor.cc:642] Executor failed to create kernel. Invalid argument: Default MaxPoolingOp only supports NHWC on device type CPU
     [[{{node max_pool}}]]
Traceback (most recent call last):
  File "/usr/local/lib64/python3.6/site-packages/tensorflow_core/python/client/session.py", line 1365, in _do_call
    return fn(*args)
  File "/usr/local/lib64/python3.6/site-packages/tensorflow_core/python/client/session.py", line 1350, in _run_fn
    target_list, run_metadata)
  File "/usr/local/lib64/python3.6/site-packages/tensorflow_core/python/client/session.py", line 1443, in _call_tf_sessionrun
    run_metadata)
tensorflow.python.framework.errors_impl.InvalidArgumentError: Default MaxPoolingOp only supports NHWC on device type CPU
     [[{{node max_pool}}]]

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "run_data_preparation.py", line 343, in <module>
    app.run(main)
  File "/root/.local/lib/python3.6/site-packages/absl/app.py", line 303, in run
    _run_main(main, args)
  File "/root/.local/lib/python3.6/site-packages/absl/app.py", line 251, in _run_main
    sys.exit(main(argv))
  File "run_data_preparation.py", line 324, in main
    prepare_test_data_RGBD(FLAGS.img_dir, FLAGS.out_dir)
  File "run_data_preparation.py", line 291, in prepare_test_data_RGBD
    pb_path, img_dir, lmk3D_ori_txt_path, lmk2D_ori_txt_path
  File "run_data_preparation.py", line 85, in detect_2Dlmk_all_imgs
    np.array([lmk3D]), np.array([img]), sess
  File "/root/hifi3dface/data_prepare/detect_2D_landmark.py", line 226, in detect_2Dlmk68
    heatmap = sess.run(outputs, {inputs: test_img})
  File "/usr/local/lib64/python3.6/site-packages/tensorflow_core/python/client/session.py", line 956, in run
    run_metadata_ptr)
  File "/usr/local/lib64/python3.6/site-packages/tensorflow_core/python/client/session.py", line 1180, in _run
    feed_dict_tensor, options, run_metadata)
  File "/usr/local/lib64/python3.6/site-packages/tensorflow_core/python/client/session.py", line 1359, in _do_run
    run_metadata)
  File "/usr/local/lib64/python3.6/site-packages/tensorflow_core/python/client/session.py", line 1384, in _do_call
    raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InvalidArgumentError: Default MaxPoolingOp only supports NHWC on device type CPU
     [[node max_pool (defined at /usr/local/lib64/python3.6/site-packages/tensorflow_core/python/framework/ops.py:1748) ]]

Original stack trace for 'max_pool':
  File "run_data_preparation.py", line 343, in <module>
    app.run(main)
  File "/root/.local/lib/python3.6/site-packages/absl/app.py", line 303, in run
    _run_main(main, args)
  File "/root/.local/lib/python3.6/site-packages/absl/app.py", line 251, in _run_main
    sys.exit(main(argv))
  File "run_data_preparation.py", line 324, in main
    prepare_test_data_RGBD(FLAGS.img_dir, FLAGS.out_dir)
  File "run_data_preparation.py", line 291, in prepare_test_data_RGBD
    pb_path, img_dir, lmk3D_ori_txt_path, lmk2D_ori_txt_path
  File "run_data_preparation.py", line 71, in detect_2Dlmk_all_imgs
    tf.import_graph_def(graph_def, name="")
  File "/usr/local/lib64/python3.6/site-packages/tensorflow_core/python/util/deprecation.py", line 507, in new_func
    return func(*args, **kwargs)
  File "/usr/local/lib64/python3.6/site-packages/tensorflow_core/python/framework/importer.py", line 405, in import_graph_def
    producer_op_list=producer_op_list)
  File "/usr/local/lib64/python3.6/site-packages/tensorflow_core/python/framework/importer.py", line 517, in _import_graph_def_internal
    _ProcessNewOps(graph)
  File "/usr/local/lib64/python3.6/site-packages/tensorflow_core/python/framework/importer.py", line 243, in _ProcessNewOps
    for new_op in graph._add_new_tf_operations(compute_devices=False):  # pylint: disable=protected-access
  File "/usr/local/lib64/python3.6/site-packages/tensorflow_core/python/framework/ops.py", line 3561, in _add_new_tf_operations
    for c_op in c_api_util.new_tf_operations(self)
  File "/usr/local/lib64/python3.6/site-packages/tensorflow_core/python/framework/ops.py", line 3561, in <listcomp>
    for c_op in c_api_util.new_tf_operations(self)
  File "/usr/local/lib64/python3.6/site-packages/tensorflow_core/python/framework/ops.py", line 3451, in _create_op_from_tf_operation
    ret = Operation(c_op, self)
  File "/usr/local/lib64/python3.6/site-packages/tensorflow_core/python/framework/ops.py", line 1748, in __init__
    self._traceback = tf_stack.extract_stack()
arpu commented 3 years ago

ok change the Line to

with tf.Graph().as_default(), tf.device('/device:XLA_CPU:0'):

looks good!

arpu commented 3 years ago

ok hit a other problem here i downgrade to tenserflow 1.8 and add CPU device BUT now i get Default MaxPoolingOp only supports NHWC on device type CPU BUT [[Node: max_pool = MaxPoolT=DT_FLOAT, data_format="NCHW", ksize=[1, 1, 2, 2], padding="VALID", strides=[1, 1, 2, 2], _device="/job:localhost/replica:0/task:0/device:CPU:0"]]

ifails on line https://github.com/tencent-ailab/hifi3dface/blob/main/data_prepare/run_data_preparation.py#L290

arpu commented 3 years ago

looks like this is only possible with tensorflow 2.0 and intel mkl on CPU

i still get

020-11-22 04:59:43.944431: E tensorflow/core/common_runtime/executor.cc:660] Executor failed to create kernel. Invalid argument: Default MaxPoolingOp only supports NHWC on device type CPU
     [[Node: max_pool = MaxPool[T=DT_FLOAT, data_format="NCHW", ksize=[1, 1, 2, 2], padding="VALID", strides=[1, 1, 2, 2], _device="/job:localhost/replica:0/task:0/device:CPU:0"](Relu_23)]]

with tensorflow==1.8 and mkl intel installed + tf.Graph().as_default(), tf.device('/device:CPU:0'): used

any hint for this?

arpu commented 3 years ago

and same with tensorflow==1.15 is the only way to update to tensorflow >= 2.0 ?

arpu commented 3 years ago

sorry for the noise, looks like i have some progress using tensorflow 2.3.1 ( and small code updates) will do a PR next week

arpu commented 3 years ago

looks like someone else have donw this at https://github.com/cleberzavadniak/hifi3dface/commit/77b9bebec1043d8e791e9b195617ed8b7b4339b1

@cleberzavadniak

cleberzavadniak commented 3 years ago

Hi! My fork is almost running fine. I intend to open a PR soon. A third party is going to test it with CUDA support, yet...

arpu commented 3 years ago

Hey @cleberzavadniak

i tested your fork on my ubuntu server but i have the same problem as with the master version

2020-11-23 20:42:47.822048: W tensorflow/core/framework/op_kernel.cc:1502] OP_REQUIRES failed at gather_nd_op.cc:47 : Invalid argument: indices[0,262143] = [0, -1, -1] does not index into param shape [1,512,512,3]
2020-11-23 20:42:47.823737: W tensorflow/core/framework/op_kernel.cc:1502] OP_REQUIRES failed at gather_nd_op.cc:47 : Invalid argument: indices[0,262143] = [0, -1, -1] does not index into param shape [1,300,300,19]
Traceback (most recent call last):
  File "/root/models/mnt/testpakfork/tensorflow/python/client/session.py", line 1356, in _do_call
    return fn(*args)
  File "/root/models/mnt/testpakfork/tensorflow/python/client/session.py", line 1341, in _run_fn
    options, feed_dict, fetch_list, target_list, run_metadata)
  File "/root/models/mnt/testpakfork/tensorflow/python/client/session.py", line 1429, in _call_tf_sessionrun
    run_metadata)
tensorflow.python.framework.errors_impl.InvalidArgumentError: indices[0,262143] = [0, -1, -1] does not index into param shape [1,512,512,3]
     [[{{node GatherNd_1}}]]

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "step0_unwrapper.py", line 350, in <module>
    tf.app.run(main)
  File "/root/models/mnt/testpakfork/tensorflow/python/platform/app.py", line 40, in run
    _run(main=main, argv=argv, flags_parser=_parse_flags_tolerate_undef)
  File "/root/models/mnt/testpakfork/absl/app.py", line 303, in run
    _run_main(main, args)
  File "/root/models/mnt/testpakfork/absl/app.py", line 251, in _run_main
    sys.exit(main(argv))
  File "step0_unwrapper.py", line 310, in main
    front_seg_batch: info["seg"][0:1, ...],
  File "/root/models/mnt/testpakfork/tensorflow/python/client/session.py", line 950, in run
    run_metadata_ptr)
  File "/root/models/mnt/testpakfork/tensorflow/python/client/session.py", line 1173, in _run
    feed_dict_tensor, options, run_metadata)
  File "/root/models/mnt/testpakfork/tensorflow/python/client/session.py", line 1350, in _do_run
    run_metadata)
  File "/root/models/mnt/testpakfork/tensorflow/python/client/session.py", line 1370, in _do_call
    raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InvalidArgumentError: indices[0,262143] = [0, -1, -1] does not index into param shape [1,512,512,3]
     [[node GatherNd_1 (defined at ../utils/unwrap_utils.py:68) ]]

Errors may have originated from an input operation.
Input Source operations connected to node GatherNd_1:
 truediv (defined at step0_unwrapper.py:165)    
 Reshape_9 (defined at ../utils/unwrap_utils.py:66)

Original stack trace for 'GatherNd_1':
  File "step0_unwrapper.py", line 350, in <module>
    tf.app.run(main)
  File "/root/models/mnt/testpakfork/tensorflow/python/platform/app.py", line 40, in run
    _run(main=main, argv=argv, flags_parser=_parse_flags_tolerate_undef)
  File "/root/models/mnt/testpakfork/absl/app.py", line 303, in run
    _run_main(main, args)
  File "/root/models/mnt/testpakfork/absl/app.py", line 251, in _run_main
    sys.exit(main(argv))
  File "step0_unwrapper.py", line 169, in main
    FLAGS.uv_size,
  File "../utils/unwrap_utils.py", line 209, in unwrap_img_into_uv
    uv_size,
  File "../utils/unwrap_utils.py", line 68, in warp_img_to_uv
    uv_map = tf.gather_nd(img_attrs, batch_uv_pos)
  File "/root/models/mnt/testpakfork/tensorflow/python/util/dispatch.py", line 180, in wrapper
    return target(*args, **kwargs)
  File "/root/models/mnt/testpakfork/tensorflow/python/ops/array_ops.py", line 3796, in gather_nd
    return gen_array_ops.gather_nd(params, indices, name=name)
  File "/root/models/mnt/testpakfork/tensorflow/python/ops/gen_array_ops.py", line 3991, in gather_nd
    "GatherNd", params=params, indices=indices, name=name)
  File "/root/models/mnt/testpakfork/tensorflow/python/framework/op_def_library.py", line 788, in _apply_op_helper
    op_def=op_def)
  File "/root/models/mnt/testpakfork/tensorflow/python/util/deprecation.py", line 507, in new_func
    return func(*args, **kwargs)
  File "/root/models/mnt/testpakfork/tensorflow/python/framework/ops.py", line 3616, in create_op
    op_def=op_def)
  File "/root/models/mnt/testpakfork/tensorflow/python/framework/ops.py", line 2005, in __init__
    self._traceback = tf_stack.extract_stack()

any idea what the problem could be ?

arpu commented 3 years ago

i have some progress now tensorflow 2.3.1 and run_opt_rgbd.sh test finish ! :>

arpu commented 3 years ago

https://github.com/arpu/hifi3dface/commit/491ac26018b4660b282884af3395ec1a12a0d13c

arpu commented 3 years ago

but still the same problem with is_bfm="true" :/

lith0613 commented 3 years ago

but still the same problem with is_bfm="true" :/

Hi, have you solved the problems ?

arpu commented 3 years ago

no looks like you need i nvidia cuda gpu

lith0613 commented 3 years ago

Hey @cleberzavadniak

i tested your fork on my ubuntu server but i have the same problem as with the master version

2020-11-23 20:42:47.822048: W tensorflow/core/framework/op_kernel.cc:1502] OP_REQUIRES failed at gather_nd_op.cc:47 : Invalid argument: indices[0,262143] = [0, -1, -1] does not index into param shape [1,512,512,3]
2020-11-23 20:42:47.823737: W tensorflow/core/framework/op_kernel.cc:1502] OP_REQUIRES failed at gather_nd_op.cc:47 : Invalid argument: indices[0,262143] = [0, -1, -1] does not index into param shape [1,300,300,19]
Traceback (most recent call last):
  File "/root/models/mnt/testpakfork/tensorflow/python/client/session.py", line 1356, in _do_call
    return fn(*args)
  File "/root/models/mnt/testpakfork/tensorflow/python/client/session.py", line 1341, in _run_fn
    options, feed_dict, fetch_list, target_list, run_metadata)
  File "/root/models/mnt/testpakfork/tensorflow/python/client/session.py", line 1429, in _call_tf_sessionrun
    run_metadata)
tensorflow.python.framework.errors_impl.InvalidArgumentError: indices[0,262143] = [0, -1, -1] does not index into param shape [1,512,512,3]
   [[{{node GatherNd_1}}]]

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "step0_unwrapper.py", line 350, in <module>
    tf.app.run(main)
  File "/root/models/mnt/testpakfork/tensorflow/python/platform/app.py", line 40, in run
    _run(main=main, argv=argv, flags_parser=_parse_flags_tolerate_undef)
  File "/root/models/mnt/testpakfork/absl/app.py", line 303, in run
    _run_main(main, args)
  File "/root/models/mnt/testpakfork/absl/app.py", line 251, in _run_main
    sys.exit(main(argv))
  File "step0_unwrapper.py", line 310, in main
    front_seg_batch: info["seg"][0:1, ...],
  File "/root/models/mnt/testpakfork/tensorflow/python/client/session.py", line 950, in run
    run_metadata_ptr)
  File "/root/models/mnt/testpakfork/tensorflow/python/client/session.py", line 1173, in _run
    feed_dict_tensor, options, run_metadata)
  File "/root/models/mnt/testpakfork/tensorflow/python/client/session.py", line 1350, in _do_run
    run_metadata)
  File "/root/models/mnt/testpakfork/tensorflow/python/client/session.py", line 1370, in _do_call
    raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InvalidArgumentError: indices[0,262143] = [0, -1, -1] does not index into param shape [1,512,512,3]
   [[node GatherNd_1 (defined at ../utils/unwrap_utils.py:68) ]]

Errors may have originated from an input operation.
Input Source operations connected to node GatherNd_1:
 truediv (defined at step0_unwrapper.py:165)  
 Reshape_9 (defined at ../utils/unwrap_utils.py:66)

Original stack trace for 'GatherNd_1':
  File "step0_unwrapper.py", line 350, in <module>
    tf.app.run(main)
  File "/root/models/mnt/testpakfork/tensorflow/python/platform/app.py", line 40, in run
    _run(main=main, argv=argv, flags_parser=_parse_flags_tolerate_undef)
  File "/root/models/mnt/testpakfork/absl/app.py", line 303, in run
    _run_main(main, args)
  File "/root/models/mnt/testpakfork/absl/app.py", line 251, in _run_main
    sys.exit(main(argv))
  File "step0_unwrapper.py", line 169, in main
    FLAGS.uv_size,
  File "../utils/unwrap_utils.py", line 209, in unwrap_img_into_uv
    uv_size,
  File "../utils/unwrap_utils.py", line 68, in warp_img_to_uv
    uv_map = tf.gather_nd(img_attrs, batch_uv_pos)
  File "/root/models/mnt/testpakfork/tensorflow/python/util/dispatch.py", line 180, in wrapper
    return target(*args, **kwargs)
  File "/root/models/mnt/testpakfork/tensorflow/python/ops/array_ops.py", line 3796, in gather_nd
    return gen_array_ops.gather_nd(params, indices, name=name)
  File "/root/models/mnt/testpakfork/tensorflow/python/ops/gen_array_ops.py", line 3991, in gather_nd
    "GatherNd", params=params, indices=indices, name=name)
  File "/root/models/mnt/testpakfork/tensorflow/python/framework/op_def_library.py", line 788, in _apply_op_helper
    op_def=op_def)
  File "/root/models/mnt/testpakfork/tensorflow/python/util/deprecation.py", line 507, in new_func
    return func(*args, **kwargs)
  File "/root/models/mnt/testpakfork/tensorflow/python/framework/ops.py", line 3616, in create_op
    op_def=op_def)
  File "/root/models/mnt/testpakfork/tensorflow/python/framework/ops.py", line 2005, in __init__
    self._traceback = tf_stack.extract_stack()

any idea what the problem could be ?

I run this demo code on gpu with tensorflow-gpu 15.0, have you solved this problem

zw20045500 commented 2 years ago

将detect_2Dlmk_all_imgs, face_seg两个函数中的网络,修改为调用GPU就可以了

这样还是不能成功啊,请问您这边有完全在CPU上跑起来吗?

在上面两个函数中,就更仅更改了以下内容,我这里就可以运行了.如果还是不行,可以把error信息贴一下 with tf.Graph().as_default(): 改为 with tf.Graph().as_default(), tf.device('/device:XLA_GPU:0'):

请问你是在GPU机器上跑的rgb吗?代码里面引用的tensorflow,不是tensorflow-gpu. 我的版本是1.18 能把你的pip list 包发一下吗