google / automl

Google Brain AutoML
Apache License 2.0
6.2k stars 1.44k forks source link

error while exporting the trained model #657

Closed Samjith888 closed 4 years ago

Samjith888 commented 4 years ago

I have trained a model by using efficientdet-d4, While converting the model , getting following error

command used

python model_inspect.py --runmode=saved_model --model_name=efficientdet-d4 \--ckpt_path=efficientdet-d4/archive/ --saved_model_dir=efficientdet-d4/savedmodeldir \--tensorrt=FP32 --tflite_path=efficientdet-d4/efficientdet-d4.tflite \--hparams=voc_config.yaml

error :

2020-08-07 05:43:34.139684: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libnvinfer.so.6'; dlerror: libnvin
fer.so.6: cannot open shared object file: No such file or directory
2020-08-07 05:43:34.139765: F tensorflow/compiler/tf2tensorrt/stub/nvinfer_stub.cc:49] getInferLibVersion symbol not found.
Fatal Python error: Aborted

Current thread 0x00007f992e4bb740 (most recent call first):
  File "/home/sam/anaconda3/envs/tf2/lib/python3.7/site-packages/tensorflow/python/compiler/tensorrt/trt_convert.py", line 259 in _check_trt_version_compa
tibility
  File "/home/sam/anaconda3/envs/tf2/lib/python3.7/site-packages/tensorflow/python/compiler/tensorrt/trt_convert.py", line 497 in __init__
  File "/home/sam/anaconda3/envs/tf2/lib/python3.7/site-packages/tensorflow/python/compiler/tensorrt/trt_convert.py", line 1353 in create_inference_graph
  File "/sam/efficientdet/automl/efficientdet/inference.py", line 675 in export
  File "model_inspect.py", line 152 in export_saved_model
  File "model_inspect.py", line 463 in run_model
  File "model_inspect.py", line 514 in main
  File "/home/sam/anaconda3/envs/tf2/lib/python3.7/site-packages/absl/app.py", line 250 in _run_main
  File "/home/sam/anaconda3/envs/tf2/lib/python3.7/site-packages/absl/app.py", line 299 in run
  File "model_inspect.py", line 521 in <module>
Aborted (core dumped)

Note : Here the tflite, .pb file and frozen.pb file is generated. Tensorrt file didn't.

1.) Seems the error is with Tensorrt convertion. How to solve this error !!!

2.) Only last 5 checkpoints are saved in the checkpoints directory (none of these 5 checkpoints are not the best checkpoints in my case ) , and the best checkpoints will be saved in 'archive' folder, So i have used '--ckpt_path=efficientdet-d4/archive' as checpoints location for generating .pb file, but this .pb file is not detecting anything while inference. (if i use --ckpt_path=efficientdet-d4/ for generating .pb file then this file is working while inference.)

mingxingtan commented 4 years ago

Looks like you did not install required library for tensorrt (missing libnvinfer.so.6). Do you need tensorRT? If not, you can remove "--tensorrt=FP32" and your command line will work.

Samjith888 commented 4 years ago

Looks like you did not install required library for tensorrt (missing libnvinfer.so.6). Do you need tensorRT? If not, you can remove "--tensorrt=FP32" and your command line will work.

okay. 2nd query : I have a trained a model and the checkpoints are saved in a folder 'checkpoints_dir'( after training , there are last 5 checkpoints inside this folder and the best checkpoint is saved in backup folder too). While model conversion into .pb , i have used --ckpt_path=echeckpoints_dir . So which checkpoint will be used here while converting the model into .pb file using python model_inspect.py command ?

mingxingtan commented 4 years ago

Hi @Samjith888 , it reads the latest checkpoint in that folder. (or you can see the file with name "checkpoint") You should use the xx/archive as the folder for exporting the best model.

mingxingtan commented 4 years ago

backup folder is useless (I added 'backup' just to deal with rare cases where tf.io.gfile.Gfile may fail for unknown reasons)

vamshi-vk commented 3 years ago

hi @mingxingtan , how can we access/save best checkpoints instead of just last 5.

mingxingtan commented 3 years ago

the best checkpint is automatically saved in model_dir/archive/