thomasfermi / Algorithms-for-Automated-Driving

Each chapter of this (mini-)book guides you in programming one important software component for automated driving.
https://thomasfermi.github.io/Algorithms-for-Automated-Driving/Introduction/intro.html
Other
352 stars 78 forks source link

Optimize lane detection execution time #17

Closed thomasfermi closed 2 years ago

thomasfermi commented 3 years ago

The lane detector in this book is a bit too slow. This leads to Carla simulations which are not 30fps.

Ideas for improvements:

MankaranSingh commented 3 years ago

I think to improve the performance one shouldnt be using pytorch for inference. onnx runtime will easily give 2-3x performance boost but on the downside, while it's easy to install and convert model to onnx, it can take attention of reader away from the main goal.

thomasfermi commented 3 years ago

Hey @MankaranSingh , thanks for the input, I did not know about onnx!

Regarding "taking away attention": That is a very good point. One could think about adding a lane_detector_onnx.py as a sibling file to lane_detector.py in the solutions directory. Also one would need to store a .onxx file in the repo. Then students could optionally install onnx and run the lane detector from lane_detector_onnx.py. It could still be a small distraction, but if it would allow for real-time execution of the sample solution lane detector in Carla, it might be worth it :)

From what I googled it might be easy to do this using torch.onnx. I might give this a go next week if I find some time. Just to find out how much the speed up will be :)

MankaranSingh commented 3 years ago

I tried converting the best_model_multi_dice_loss.pth to onnx but it contains a custom op named SwishImplementation that causes errors when converting. this is an issue I found.

but anyways, I tried converting efficientnet-b0 encoder to onnx with the given solution and the fps improved from 0.2 to 2.5 on CPU. this seems promising. but we may have to retrain the model by setting memory efficient swish to false as shown in the issue.

thomasfermi commented 3 years ago

Hey @MankaranSingh , thanks for looking into this! I tried

model.encoder.set_swish(memory_efficient=False)

and it worked :) Will try to load that onnx file into the onnxruntime this evening :)

Btw: Retraining was not necessary. I just loaded the pth file.

MankaranSingh commented 3 years ago

ohh, I ttried model.set_swish(memory_efficient=False) and thought this method was only available fot models in smp. Godd to see that.

You can use onnxruntime-gpu with these settings:

import onnxruntime as ort

options = ort.SessionOptions()
options.execution_mode = ort.ExecutionMode.ORT_SEQUENTIAL
options.graph_optimization_level = ort.GraphOptimizationLevel.ORT_ENABLE_ALL
provider = 'CUDAExecutionProvider'
ort_session = ort.InferenceSession('model.onnx', options)
ort_session.set_providers([provider], None)

img = np.random.randn(img_shape)
outs = ort_session.run(None, img)[0]
thomasfermi commented 3 years ago

Hi @MankaranSingh , I had to install CudaToolkit and cuDNN to get the onnxruntime to work on my machine. Sadly the inference time (+polyfitting but this is not the bottleneck) with onnxruntime is 164ms, while inference time with pytorch is 60ms. So it actually gets slower :( It seems that onnxruntime is not a plug and play accelerator...

thomasfermi commented 2 years ago

update: I have some local code with some updates that I will commit soon. For segmentation I am using the fastseg library and MobileV3Small, which is a bit faster than the current smpmodel

EDIT: commit bb090adb750c6346ee45b2d355efd46da2758dc1 introduced the changes to the CameraCalibrationDev branch. Closing this issue for now