Open intouch1233 opened 3 years ago
Thanks for the detailed notes, very helpful for debugging. I believe I found the issue.
Quick hack to make it work: edit here the feature_depth
to 128 (or whatever dimension you used for the autoencoder), instead of 1024. Then re-export the model, and extraction should work at that point.
A few options to fix this bug:
1) Make the export_model_utils.ExtractLocalFeatures accept a new argument feature_depth=1024
, which can be set here. This would replace the hard-coded feature_depth
mentioned above.
2) Rewrite export_model_utils.ExtractLocalFeatures to make it more similar to ExtractLocalAndGlobalFeatures, which uses autograph and does not actually require the feature_depth
to be set. Basically, this would correspond to this TODO.
I think (2) is probably the best way to go in a long-term perspective. @dan-anghel , would you be interested in tackling (2)?
Hi @andrefaraujo ! Sure, I can take a look at it and make the changes.
@intouch1233 Would you like to share the specific code or script you used to train your dataset
I have trained my own data and this is the script I use.
Training
python3 train.py --train_file_pattern=/home/sornnarong/workspace/share_drive_31/dataset/AiProducts-Challenge-master/tfrecord2/train --validation_file_pattern=/home/sornnarong/workspace/share_drive_31/dataset/AiProducts-Challenge-master/tfrecord2/validation --imagenet_checkpoint=resnet50_weights_tf_dim_ordering_tf_kernels_notop.h5 --dataset_version=ai_product --logdir=aiproducts_training3
Export model
python3 model/export_local_model.py --ckpt_path=aiproducts_training3/delf_weights --export_path=aiproducts_training3_model
python3 model/export_local_model.py --ckpt_path=aiproducts_training3/delf_weights --export_path=aiproducts_training3_model 2021-02-03 00:16:17.125601: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0 2021-02-03 00:16:20.054145: I tensorflow/compiler/jit/xla_cpu_device.cc:41] Not creating XLA devices, tf_xla_enable_xla_devices not set 2021-02-03 00:16:20.056257: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcuda.so.1 2021-02-03 00:16:20.114068: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:941] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2021-02-03 00:16:20.114449: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1720] Found device 0 with properties: pciBusID: 0000:01:00.0 name: GeForce RTX 2080 Ti computeCapability: 7.5 coreClock: 1.65GHz coreCount: 68 deviceMemorySize: 10.76GiB deviceMemoryBandwidth: 573.69GiB/s 2021-02-03 00:16:20.114467: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0 2021-02-03 00:16:20.162971: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublas.so.11 2021-02-03 00:16:20.163056: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublasLt.so.11 2021-02-03 00:16:20.192167: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcufft.so.10 2021-02-03 00:16:20.213417: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcurand.so.10 2021-02-03 00:16:20.233074: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcusolver.so.10 2021-02-03 00:16:20.254751: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcusparse.so.11 2021-02-03 00:16:20.261000: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudnn.so.8 2021-02-03 00:16:20.261269: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:941] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2021-02-03 00:16:20.262565: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:941] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2021-02-03 00:16:20.263735: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1862] Adding visible gpu devices: 0 2021-02-03 00:16:20.265377: I tensorflow/compiler/jit/xla_gpu_device.cc:99] Not creating XLA devices, tf_xla_enable_xla_devices not set 2021-02-03 00:16:20.265754: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:941] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2021-02-03 00:16:20.267338: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1720] Found device 0 with properties: pciBusID: 0000:01:00.0 name: GeForce RTX 2080 Ti computeCapability: 7.5 coreClock: 1.65GHz coreCount: 68 deviceMemorySize: 10.76GiB deviceMemoryBandwidth: 573.69GiB/s 2021-02-03 00:16:20.267415: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0 2021-02-03 00:16:20.267505: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublas.so.11 2021-02-03 00:16:20.267571: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublasLt.so.11 2021-02-03 00:16:20.267631: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcufft.so.10 2021-02-03 00:16:20.267689: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcurand.so.10 2021-02-03 00:16:20.267752: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcusolver.so.10 2021-02-03 00:16:20.267809: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcusparse.so.11 2021-02-03 00:16:20.267870: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudnn.so.8 2021-02-03 00:16:20.268103: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:941] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2021-02-03 00:16:20.269825: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:941] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2021-02-03 00:16:20.271322: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1862] Adding visible gpu devices: 0 2021-02-03 00:16:20.271437: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0 2021-02-03 00:16:20.705346: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1261] Device interconnect StreamExecutor with strength 1 edge matrix: 2021-02-03 00:16:20.705372: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1267] 0 2021-02-03 00:16:20.705378: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1280] 0: N 2021-02-03 00:16:20.705557: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:941] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2021-02-03 00:16:20.705967: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:941] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2021-02-03 00:16:20.706334: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:941] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2021-02-03 00:16:20.706678: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1406] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 9903 MB memory) -> physical GPU (device: 0, name: GeForce RTX 2080 Ti, pci bus id: 0000:01:00.0, compute capability: 7.5) Checkpoint loaded from aiproducts_training3/delf_weights 2021-02-03 00:16:20.975485: W tensorflow/python/util/util.cc:348] Sets are not currently considered sequences, but this may change in the future, so consider avoiding using them. WARNING:tensorflow:Skipping full serialization of Keras layer <delf.python.training.model.resnet50.ResNet50 object at 0x7f63fc5cbf10>, because it is not built. W0203 00:16:24.967820 140069560780608 save_impl.py:78] Skipping full serialization of Keras layer <delf.python.training.model.resnet50.ResNet50 object at 0x7f63fc5cbf10>, because it is not built. WARNING:tensorflow:Skipping full serialization of Keras layer <tensorflow.python.keras.layers.pooling.AveragePooling2D object at 0x7f63f0115d10>, because it is not built. W0203 00:16:30.581585 140069560780608 save_impl.py:78] Skipping full serialization of Keras layer <tensorflow.python.keras.layers.pooling.AveragePooling2D object at 0x7f63f0115d10>, because it is not built. W0203 00:16:38.962221 140069560780608 save.py:241] Found untraced functions such as conv1_layer_call_and_return_conditional_losses, conv1_layer_call_fn, conv1_layer_call_fn, conv1_layer_call_and_return_conditional_losses, conv1_layer_call_and_return_conditional_losses while saving (showing 5 of 5). These functions will not be directly callable after loading. W0203 00:16:39.572054 140069560780608 save.py:241] Found untraced functions such as conv1_layer_call_and_return_conditional_losses, conv1_layer_call_fn, conv1_layer_call_fn, conv1_layer_call_and_return_conditional_losses, conv1_layer_call_and_return_conditional_losses while saving (showing 5 of 5). These functions will not be directly callable after loading. INFO:tensorflow:Assets written to: aiproducts_training3_model/assets I0203 00:16:42.081132 140069560780608 builder_impl.py:775] Assets written to: aiproducts_training3_model/assets WARNING:tensorflow:Unresolved object in checkpoint: (root).desc_classification W0203 00:16:42.494237 140069560780608 util.py:161] Unresolved object in checkpoint: (root).desc_classification WARNING:tensorflow:Unresolved object in checkpoint: (root).attn_classification W0203 00:16:42.494378 140069560780608 util.py:161] Unresolved object in checkpoint: (root).attn_classification WARNING:tensorflow:Unresolved object in checkpoint: (root).desc_classification.kernel W0203 00:16:42.494426 140069560780608 util.py:161] Unresolved object in checkpoint: (root).desc_classification.kernel WARNING:tensorflow:Unresolved object in checkpoint: (root).desc_classification.bias W0203 00:16:42.494526 140069560780608 util.py:161] Unresolved object in checkpoint: (root).desc_classification.bias WARNING:tensorflow:Unresolved object in checkpoint: (root).attn_classification.kernel W0203 00:16:42.494635 140069560780608 util.py:161] Unresolved object in checkpoint: (root).attn_classification.kernel WARNING:tensorflow:Unresolved object in checkpoint: (root).attn_classification.bias W0203 00:16:42.494672 140069560780608 util.py:161] Unresolved object in checkpoint: (root).attn_classification.bias WARNING:tensorflow:A checkpoint was restored (e.g. tf.train.Checkpoint.restore or tf.keras.Model.load_weights) but not all checkpointed values were used. See above for specific issues. Use expect_partial() on the load status object, e.g. tf.train.Checkpoint.restore(...).expect_partial(), to silence these warnings, or use assert_consumed() to make the check explicit. See https://www.tensorflow.org/guide/checkpoint#loading_mechanics for details. W0203 00:16:42.494736 140069560780608 util.py:169] A checkpoint was restored (e.g. tf.train.Checkpoint.restore or tf.keras.Model.load_weights) but not all checkpointed values were used. See above for specific issues. Use expect_partial() on the load status object, e.g. tf.train.Checkpoint.restore(...).expect_partial(), to silence these warnings, or use assert_consumed() to make the check explicit. See https://www.tensorflow.org/guide/checkpoint#loading_mechanics for details.
Extract Features
python3 ./extract_features.py --delf_config_path delf_config_example.pbtxt --list_images_path list_images.txt --output_dir ./aiproduct_features
Below is my extract features error logs.
2021-02-03 00:26:41.075329: W tensorflow/stream_executor/platform/default/dso_loader.cc:60] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory 2021-02-03 00:26:41.075371: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. Reading list of images... done! Found 2 images 2021-02-03 00:26:42.994139: I tensorflow/compiler/jit/xla_cpu_device.cc:41] Not creating XLA devices, tf_xla_enable_xla_devices not set 2021-02-03 00:26:42.995276: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcuda.so.1 2021-02-03 00:26:43.000722: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:941] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2021-02-03 00:26:43.001348: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1720] Found device 0 with properties: pciBusID: 0000:02:00.0 name: GRID V100D-16C computeCapability: 7.0 coreClock: 1.38GHz coreCount: 80 deviceMemorySize: 16.00GiB deviceMemoryBandwidth: 836.37GiB/s 2021-02-03 00:26:43.001508: W tensorflow/stream_executor/platform/default/dso_loader.cc:60] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory 2021-02-03 00:26:43.001614: W tensorflow/stream_executor/platform/default/dso_loader.cc:60] Could not load dynamic library 'libcublas.so.11'; dlerror: libcublas.so.11: cannot open shared object file: No such file or directory 2021-02-03 00:26:43.001705: W tensorflow/stream_executor/platform/default/dso_loader.cc:60] Could not load dynamic library 'libcublasLt.so.11'; dlerror: libcublasLt.so.11: cannot open shared object file: No such file or directory 2021-02-03 00:26:43.001798: W tensorflow/stream_executor/platform/default/dso_loader.cc:60] Could not load dynamic library 'libcufft.so.10'; dlerror: libcufft.so.10: cannot open shared object file: No such file or directory 2021-02-03 00:26:43.001895: W tensorflow/stream_executor/platform/default/dso_loader.cc:60] Could not load dynamic library 'libcurand.so.10'; dlerror: libcurand.so.10: cannot open shared object file: No such file or directory 2021-02-03 00:26:43.001979: W tensorflow/stream_executor/platform/default/dso_loader.cc:60] Could not load dynamic library 'libcusolver.so.10'; dlerror: libcusolver.so.10: cannot open shared object file: No such file or directory 2021-02-03 00:26:43.002063: W tensorflow/stream_executor/platform/default/dso_loader.cc:60] Could not load dynamic library 'libcusparse.so.11'; dlerror: libcusparse.so.11: cannot open shared object file: No such file or directory 2021-02-03 00:26:43.002264: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudnn.so.8 2021-02-03 00:26:43.002283: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1757] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform. Skipping registering GPU devices... 2021-02-03 00:26:43.002536: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX512F To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. 2021-02-03 00:26:43.002749: I tensorflow/compiler/jit/xla_gpu_device.cc:99] Not creating XLA devices, tf_xla_enable_xla_devices not set 2021-02-03 00:26:43.002789: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1261] Device interconnect StreamExecutor with strength 1 edge matrix: 2021-02-03 00:26:43.002836: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1267]
app.run(main=main, argv=[sys.argv[0]] + unparsed)
File "/home/mls/.local/lib/python3.6/site-packages/tensorflow/python/platform/app.py", line 40, in run
_run(main=main, argv=argv, flags_parser=_parse_flags_tolerate_undef)
File "/home/mls/.local/lib/python3.6/site-packages/absl/app.py", line 300, in run
_run_main(main, args)
File "/home/mls/.local/lib/python3.6/site-packages/absl/app.py", line 251, in _run_main
sys.exit(main(argv))
File "./extract_features.py", line 109, in main
extracted_features = extractor_fn(im)
File "/home/mls/workspace/API/models/research/delf/delf/python/examples/extractor.py", line 199, in ExtractorFn
input_abs_thres=score_threshold_tensor)
File "/home/mls/.local/lib/python3.6/site-packages/tensorflow/python/eager/function.py", line 1669, in call
return self._call_impl(args, kwargs)
File "/home/mls/.local/lib/python3.6/site-packages/tensorflow/python/eager/function.py", line 1679, in _call_impl
cancellation_manager)
File "/home/mls/.local/lib/python3.6/site-packages/tensorflow/python/eager/function.py", line 1762, in _call_with_structured_signature
cancellation_manager=cancellation_manager)
File "/home/mls/.local/lib/python3.6/site-packages/tensorflow/python/saved_model/load.py", line 116, in _call_flat
cancellation_manager)
File "/home/mls/.local/lib/python3.6/site-packages/tensorflow/python/eager/function.py", line 1919, in _call_flat
ctx, args, cancellation_manager=cancellation_manager))
File "/home/mls/.local/lib/python3.6/site-packages/tensorflow/python/eager/function.py", line 560, in call
ctx=ctx)
File "/home/mls/.local/lib/python3.6/site-packages/tensorflow/python/eager/execute.py", line 60, in quick_execute
inputs, attrs, num_outputs)
tensorflow.python.framework.errors_impl.InvalidArgumentError: Input to reshape is a tensor with 15488 values, but the requested shape requires a multiple of 1024
[[{{node StatefulPartitionedCall/while/body/_341/while/Reshape_2}}]] [Op:__inference_signature_wrapper_21391]
Starting to extract DELF features from images... image shape -- (1000, 1000, 3) 2021-02-03 00:26:48.357299: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:116] None of the MLIR optimization passes are enabled (registered 2) 2021-02-03 00:26:48.434554: I tensorflow/core/platform/profile_utils/cpu_utils.cc:112] CPU Frequency: 2294605000 Hz 2021-02-03 00:26:49.147637: W tensorflow/core/framework/cpu_allocator_impl.cc:80] Allocation of 256000000 exceeds 10% of free system memory. 2021-02-03 00:26:49.350257: W tensorflow/core/framework/cpu_allocator_impl.cc:80] Allocation of 256000000 exceeds 10% of free system memory. 2021-02-03 00:26:49.798080: W tensorflow/core/framework/cpu_allocator_impl.cc:80] Allocation of 254977024 exceeds 10% of free system memory. 2021-02-03 00:26:49.890861: W tensorflow/core/framework/cpu_allocator_impl.cc:80] Allocation of 254977024 exceeds 10% of free system memory. 2021-02-03 00:26:49.990515: W tensorflow/core/framework/cpu_allocator_impl.cc:80] Allocation of 254977024 exceeds 10% of free system memory. Traceback (most recent call last): File "./extract_features.py", line 146, in
Function call stack: signature_wrapper
Below is my delf config
use_local_features: true use_global_features: false model_path: "parameters/aiproducts_training3_model/" image_scales: .25 image_scales: .3536 image_scales: .5 image_scales: .7071 image_scales: 1.0 image_scales: 1.4142 image_scales: 2.0 is_tf2_exported: true
delf_local_config { use_pca: false max_feature_num: 1000 score_threshold: 100.0
}
max_image_size: 1024
Very thx.