Hi all. I have spent quite some time reading and using this awesome code. Converting the model to .onnx and .engine wasn't too easy so I share how I did it.
Installation
Create a virtual env using python 3.7.12. Don't use python 3.8!
For me tensorflow==1.15 didn't work. You can also install tensorflow-gpu. Make sure it's the same version or check the support matrix on Tensorflow page. Note that most of tf1.x stuff is deprecated so it's hard to get support for that. I'm thinking of implementing this whole repo in pytorch or tf2 for that reason.
If python3 setup.py install hangs just install the dependencies by hand one by one.
Get the pre-trained weights: bash get_trained_weights.bash
Running the model
Before converting anything test the model:
cd src
python3 elg_demo.py
I got a ton of errors but the model worked nonetheless.
Add these code lines before line 385 yield outputs:
# Save saved-model
tf.saved_model.simple_save(self._tensorflow_session, "tmp",
inputs=data_source.output_tensors, outputs=fetches)
When you run this code again (python3 elg_demo.py) it will create a folder tmp with the saved_model.pb in it. But don't run it yet because if you try to convert the code you will get this error:
ValueError: Input 0 of node hourglass/pre/BatchNorm/cond_1/AssignMovingAvg/Switch was passed float from hourglass/pre/BatchNorm/moving_mean:0 incompatible with expected float_ref.
The error is actually quite good: it tells where in the graph we got problems. BatchNorm is creating the problem. There are quite many answers on Google about this issue but I think the easiest way to fix it is to set training to False as BatchNorm behaves differently when training / when testing. Change at least these lines:
For most of your needs that should be enough. You can add --opset <opset> for example --opset 10 if you want to target a specific opset. You can also add --target tensorrt or similar. Check the tf2onnx repo for more flags if you need them.
There's one more thing you should know.
Converting the model to TensorRT .engine
If you try to convert the model using tools like trtexec or similar you'll end up with a small problem. The model contains uint8 but it's not supported by TensorRT. You must remove the uint8's in the model like this:
Hi all. I have spent quite some time reading and using this awesome code. Converting the model to .onnx and .engine wasn't too easy so I share how I did it.
Installation
python 3.7.12
. Don't use python 3.8!cd GazeML
.venv
and activate itpython3.7 -m venv .venv
source .venv/bin/activate
deactivate
Ok, time to install everything!
For me
tensorflow==1.15
didn't work. You can also installtensorflow-gpu
. Make sure it's the same version or check the support matrix on Tensorflow page. Note that most of tf1.x stuff is deprecated so it's hard to get support for that. I'm thinking of implementing this whole repo in pytorch or tf2 for that reason.If
python3 setup.py install
hangs just install the dependencies by hand one by one.Get the pre-trained weights:
bash get_trained_weights.bash
Running the model
Before converting anything test the model:
I got a ton of errors but the model worked nonetheless.
Saving the model as .onnx
Use this tool: tf2onnx
Then, we have to modify the code a bit before we can get started.
Save the
saved-model
in inference_generator()Add these code lines before line 385
yield outputs
:When you run this code again (
python3 elg_demo.py
) it will create a foldertmp
with thesaved_model.pb
in it. But don't run it yet because if you try to convert the code you will get this error:The error is actually quite good: it tells where in the graph we got problems.
BatchNorm
is creating the problem. There are quite many answers on Google about this issue but I think the easiest way to fix it is to settraining
toFalse
asBatchNorm
behaves differently when training / when testing. Change at least these lines:is_training=False
self.use_batch_statistics: False,
self.use_batch_statistics: False,
and optionally:
is_training=False
if you're usingdpg
. If you don't know, just change it.This is a bug in the code:
self.use_batch_statistics
is set toTrue
everywhere but it isn't set toFalse
at any point. I could create a PR for this.Now we have done all the changes.
You can convert that file to `.onnx. like so:
For most of your needs that should be enough. You can add
--opset <opset>
for example--opset 10
if you want to target a specific opset. You can also add--target tensorrt
or similar. Check the tf2onnx repo for more flags if you need them.There's one more thing you should know.
Converting the model to TensorRT .engine
If you try to convert the model using tools like trtexec or similar you'll end up with a small problem. The model contains
uint8
but it's not supported by TensorRT. You must remove theuint8
's in the model like this:here change
uint8
toint64
and it will work.Then you can convert:
or using onnx2trt:
That should be it. Thank you! I hope my weeks of grinding helps someone. Please ask me if there are any questions.