lshug / first-order-model-tf

TensorFlow port of first-order motion model. TF Lite and TF.js compatible, supports the original's checkpoints and implements in-graph kp processing, but inference only (no training).
MIT License
34 stars 9 forks source link

Bug: Testlite fails #2

Closed HashedViking closed 3 years ago

HashedViking commented 3 years ago

@lshug I've successfully built tflite models, but:

Running testlite.py on MacOS fails with this:

2021-04-01 23:48:41.767969: I tensorflow/compiler/jit/xla_cpu_device.cc:41] Not creating XLA devices, tf_xla_enable_xla_devices not set
INFO: Created TensorFlow Lite delegate for select TF ops.
2021-04-01 23:48:42.077974: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
INFO: TfLiteFlexDelegate delegate: 2 nodes delegated out of 353 nodes with 1 partitions.

2021-04-01 23:48:42.083084: I tensorflow/core/profiler/lib/profiler_session.cc:136] Profiler session initializing.
2021-04-01 23:48:42.083095: I tensorflow/core/profiler/lib/profiler_session.cc:155] Profiler session started.
100%|██████████| 50/50 [00:10<00:00,  4.58it/s]
Traceback (most recent call last):
  File "/Users/SERG/development/first-order-model/first-order-model-tf-fork/testlite.py", line 74, in <module>
    predictions = animate(source_image,frames,generator,kp_detector,process_kp_driving,4,parser.relative,parser.adapt_movement_scale)
  File "/Users/SERG/development/first-order-model/first-order-model-tf-fork/animate.py", line 31, in animate
    kp_driving = process_kp_driving(kp_driving,kp_source,relative,adapt_movement_scale)
  File "/Users/SERG/development/first-order-model/first-order-model-tf-fork/testlite.py", line 65, in process_kp_driving
    process_kp_driving_interpreter.allocate_tensors()
  File "/Users/SERG/opt/anaconda3/envs/fom_tf/lib/python3.8/site-packages/tensorflow/lite/python/interpreter.py", line 259, in allocate_tensors
    return self._interpreter.AllocateTensors()
RuntimeError: tensorflow/lite/kernels/pack.cc:61 HaveSameShapes(input0, input) was not true.Node number 8 (PACK) failed to prepare.

test.py and run.py work ok, though.

Also I've added this at 35 line of build.py to bypass the error #3

generator_converter.target_spec.supported_ops = [
    tf.lite.OpsSet.TFLITE_BUILTINS, # enable TensorFlow Lite ops.
    tf.lite.OpsSet.SELECT_TF_OPS # enable TensorFlow ops.
]

Commenting out this line at animate.py produces another error

#Step 3: process kp_driving
#kp_driving = process_kp_driving(kp_driving,kp_source,relative,adapt_movement_scale)
2021-04-02 00:10:40.956302: I tensorflow/compiler/jit/xla_cpu_device.cc:41] Not creating XLA devices, tf_xla_enable_xla_devices not set
INFO: Created TensorFlow Lite delegate for select TF ops.
2021-04-02 00:10:41.260118: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
INFO: TfLiteFlexDelegate delegate: 2 nodes delegated out of 353 nodes with 1 partitions.

2021-04-02 00:10:41.264943: I tensorflow/core/profiler/lib/profiler_session.cc:136] Profiler session initializing.
2021-04-02 00:10:41.264954: I tensorflow/core/profiler/lib/profiler_session.cc:155] Profiler session started.
 98%|█████████▊| 49/50 [00:11<00:00,  4.34it/s]
Batches 0/200
100%|██████████| 50/50 [00:11<00:00,  4.39it/s]
  0%|          | 0/50 [00:00<?, ?it/s]
Traceback (most recent call last):
  File "/Users/SERG/development/first-order-model/first-order-model-tf-fork/testlite.py", line 74, in <module>
    predictions = animate(source_image,frames,generator,kp_detector,process_kp_driving,4,parser.relative,parser.adapt_movement_scale)
  File "/Users/SERG/development/first-order-model/first-order-model-tf-fork/animate.py", line 40, in animate
    predictions.append(generator([source_image,kp_driving_tensor,kp_source]))
  File "/Users/SERG/development/first-order-model/first-order-model-tf-fork/testlite.py", line 54, in generator
    generator_interpreter.invoke()
  File "/Users/SERG/opt/anaconda3/envs/fom_tf/lib/python3.8/site-packages/tensorflow/lite/python/interpreter.py", line 540, in invoke
    self._interpreter.Invoke()
RuntimeError: tensorflow/lite/kernels/reshape.cc:69 num_input_elements != num_output_elements (80 != 20)Node number 13 (RESHAPE) failed to prepare.

My env:

MacOS 11.2.3 
conda 4.9.2
Python 3.8.8

requirements.txt:

imageio==2.9.0
imageio-ffmpeg==0.4.3
numpy==1.19.5
PyYAML==5.3.1
tensorflow==2.4.1
tensorboard==2.4.1
scikit-image==0.18.1
tqdm==4.59.0
torch==1.8.1
HashedViking commented 3 years ago

Found a workaround

  1. set batch_size to 1
    predictions = animate(source_image,frames,generator,kp_detector,process_kp_driving,1,parser.relative,parser.adapt_movement_scale)
  2. comment out kp_driving = process_kp_driving(kp_driving,kp_source,relative,adapt_movement_scale) - as it's still failing even with batch_saze == 1

that seems like a TFLite related issue with tflite.resize_tensor_input

HashedViking commented 3 years ago

Interesting, how did your code worked without any issues on your machine? Probably, you have Windows/Linux.

lshug commented 3 years ago

This looks like an issue where input indices for tf lite interpreter (process_kp_driving_kp_driving_index, etc.) don't match the correct inputs. I changed it so it would find the correct indices using input names. Can you pull and try running again?

HashedViking commented 3 years ago

Ok, now it works without commenting out step 3 (tested both with batch_size == 1 and batch_size == 4)

#Step 3: process kp_driving
kp_driving = process_kp_driving(kp_driving,kp_source,relative,adapt_movement_scale)

But I've used tflite models built with previous version of build.py because the new one fails at kp_detector_tflite = kp_detector_converter.convert()

Edit: Successfully used new models after fixing build.py adding tf.lite.OpsSet.SELECT_TF_OPS to kp_detector_converter

lshug commented 3 years ago

Since testlite.py itself works and the remaining issue is with build.py, I'm closing this issue.

HashedViking commented 3 years ago

Sure, soon I'll try to run it on iOS, will report back.