tensorflow / tensorrt

TensorFlow/TensorRT integration
Apache License 2.0
737 stars 226 forks source link

Jupyter Notebook kernel dies automatically #157

Open WeiFoo opened 4 years ago

WeiFoo commented 4 years ago

I was trying to run the following notebook on Ubuntu 18.04 with T4 GPU on EC2,

https://github.com/tensorflow/tensorrt/blob/master/tftrt/examples/image-classification/TFv2-TF-TRT-inference-from-Keras-saved-model.ipynb

I can run most cells until TF-TRT FP32 model section, the kernel will die automatically.

I even restarted the runtime and just ran the following code, the kernel still die

conversion_params = trt.DEFAULT_TRT_CONVERSION_PARAMS._replace(precision_mode=trt.TrtPrecisionMode.FP32,
                                                               max_workspace_size_bytes=8000000000)

converter = trt.TrtGraphConverterV2(input_saved_model_dir='resnet50_saved_model',
                                    conversion_params=conversion_params)
converter.convert()
converter.save(output_saved_model_dir='resnet50_saved_model_TFTRT_FP32')
print('Done Converting to TF-TRT FP32')

Anyone has an idea? Thanks!

sayakpaul commented 4 years ago

I am also experiencing the same. Any pointers @pooyadavoodi?

pooyadavoodi commented 4 years ago

I have seen an issue related to running int8 calibration in the same process that previously ran fp32/fp16 conversion. But if you run each conversion once per process, I expect it to work.

pooyadavoodi commented 4 years ago

I just tried TF-TRT FP32 and it worked. I got the following perf on a P100:

Step 0: 10.8ms
Step 50: 10.8ms
Step 100: 10.8ms
Step 150: 10.8ms
Step 200: 10.8ms
Step 250: 10.8ms
Step 300: 10.8ms
Step 350: 10.8ms
Step 400: 10.8ms
Step 450: 10.8ms
Step 500: 10.8ms
Step 550: 10.8ms
Step 600: 10.8ms
Step 650: 10.8ms
Step 700: 10.8ms
Step 750: 10.8ms
Step 800: 10.8ms
Step 850: 10.8ms
Step 900: 10.8ms
Step 950: 10.8ms
Throughput: 742 images/s

Perhaps some colab nodes aren't stable?

sayakpaul commented 4 years ago

Yeah, it might be the case. Worth propagating to the Colab team, I guess.

rvorias commented 4 years ago

My kernel is also dying when doing this step. Running Jupyter inside the docker container with FP32, FP16 on GTX 1650.

figkim commented 4 years ago

I had the same symptoms. In my case, it was caused by not adding <your tensorRT path>\lib to LD_LIBRARY_PATH before running jupyter lab. Adding Path and running Jupyter lab again solved it.

azayz commented 4 years ago

Hey, my kernel dies not during conversion and optimization of the model but during inference, it converts the model smoothly both with fp16 and fp32 but during inference ( predicting one image) the kernel dies and automatically restarts. Any help ?

sayakpaul commented 4 years ago

I actually put together a tutorial a few days back that shows how to use TensorRT in an end-to-end manner for accelerating inference: https://sayak.dev/tf.keras/tensorrt/tensorflow/2020/07/01/accelerated-inference-trt.html. Hope this will be helpful.