rwightman / posenet-python

A Python port of Google TensorFlow.js PoseNet (Real-time Human Pose Estimation)
Apache License 2.0
494 stars 184 forks source link

Add ResNet50 backbone #14

Open rwightman opened 5 years ago

rwightman commented 5 years ago

Add resnet50 backbone as an option, as per: https://github.com/tensorflow/tfjs-models/pull/199

selzero commented 5 years ago

Hi Ross,

Is there an ETA on this? Do you need help?

rwightman commented 5 years ago

@selzero it's still on the list of things to do (eventually), but not highest priority for me right now. One thing that makes it a little more work than I'd hoped was that it uses a different model format (see Switches to use the new TensorFlow.js 1.0 way of model loading and running inference comment in the PR above), so have to dig through the differences instead of just doing a quick adaption of what I already have...

PR is welcome

Have you tried the new models via TFJS? Noteworthy improvement?

jrbasso commented 5 years ago

@rwightman The ResNet50 model is very accurate, but as you can imagine, it's also very heavy. Newer laptops like a new Macbook Pro can handle fine, but a Macbook Pro 2014 couldn't handle it for streaming.

Regarding the new format, I think they just moved from building each layer manually to a pre-set graph with the weights and bias combined. That avoids loading the dozen of files. Also, seems they are training the model with python, so it just converts to tfjs and it's good to go without having to update all the js code to adapt to different layers, weight files, bias, etc.

jendonyuen commented 5 years ago

I found this ResNet50 model is tfjs_graph_model which generated by tfjs-converter. but tfjs-converter doesn't provide a function to convert tfjs_graph_model to python model. https://github.com/tensorflow/tfjs-converter https://github.com/tensorflow/tfjs/issues/1575

selzero commented 5 years ago

We have achieved much better accuracy with the ResNet50 model. Accuracy also means we are getting less "flicker" on the keypoints.

It's pretty easy to run the latest posenet demo and select ResNet50 from the drop down to see the difference.

timtensor commented 5 years ago

@selzero , have you tried , perhaps on saved videos , rather than streaming of videos ?

selzero commented 5 years ago

@timtensor we have to use it on webcam.

ResNet50 is giving us better and more stable predictions.

gustavz commented 5 years ago

@rwightman also very interested in the resnet50 adaption!

Will also start working on it, and maybe help with a pr!

Keep us posted on your progress!

gzchenjiajun commented 5 years ago

How can I put resnet50 into this code to run? Now the reliability of mobilenet is too low.

darcula1993 commented 5 years ago

New model format is a serious problem. It seems that tfjs encode all weights in four .bin files which I have no idea how to extract weights from them. Is there any progress?

gzchenjiajun commented 5 years ago

Because I want to be recognized by people, I turned to reid. @darcula1993

ilkersigirci commented 4 years ago

Is there any progress about ResNet implementation?

gzchenjiajun commented 4 years ago

@JendonYuen Is there any progress about ResNet implementation?

gzchenjiajun commented 4 years ago

It seems that I have to ask my colleagues in charge of the front end to assist me and transmit the results to me in real time

jendonyuen commented 4 years ago

@JendonYuen Is there any progress about ResNet implementation?

Nope...We turn back to node.js and build tfjs-node on arm boards...

jendonyuen commented 4 years ago

@JendonYuen Is there any progress about ResNet implementation?

https://github.com/tensorflow/tfjs/issues/1575#issuecomment-562728600 https://github.com/patlevin/tfjs-to-tf

Alfonso0589 commented 4 years ago

@JendonYuen thank you for the above links. I am not an expert in tf, but I managed to successfully implement the original MobileNet version of the pose estimator. However, I reach poor accuracy on certain images, hence I wanted to try a more powerful architecture.

I am reading the posts about converting tfjs models to tf ones, however I am getting lost. Would you have a simple script or working example in which a ResNet architecture has been implemented in Python? Any help is very welcome as I am really stuck.

Thanks

jendonyuen commented 4 years ago

@Alfonso0589 https://github.com/ajaichemmanam/posenetv2-pythontf

rwightman commented 4 years ago

@JendonYuen it's good to see someone taking a crack at it, but would have been nice to have a reference back to this as he pretty much took my code with no acknowledgement

rigolepe commented 4 years ago

I was working on a ResNet50 integration in parallel with the above implementation: https://github.com/atomicbits/posenet-python

I also focused on running it on Tensorflow 2.x and transforming the tensorflow.js models to the tensorflow saved model using https://github.com/patlevin/tfjs-to-tf Not everything is idiomatic TF2 yet, though...

Let me know what you think...

darcula1993 commented 4 years ago

I already created tf.keras h5 format Mobilenet model using tfjs-to-tf under tf2.0. Here are the model files. Should be able to create ResNet50 model by following same path: using tfjs-to-tf to extract weights => construct model structure by hand using tf.keras =>load weights. posenet.zip

gzchenjiajun commented 4 years ago

@darcula1993 @rigolepe Does resnet50 have any model files?

rigolepe commented 4 years ago

@gzchenjiajun The code automatically downloads the model files if you don't have them yet and transforms them into a TensorFlow saved_model.pb file which you can easily load into TF2.

For example, the tensorflow.js (javascript) resnet50 model json file for posenet is hosted here: https://storage.googleapis.com/tfjs-models/savedmodel/posenet/resnet50/float/model-stride32.json Keep in mind that you also need the group1-shard1of23.bin ... to group1-shard23of23.bin files that are referred to under the weightsManifest property in the json file. The tfjs-to-tf library converts these into the saved_model.pb for use in tensorflow (python).

rwightman commented 4 years ago

@rigolepe looks good! so it's running with the ResNet50 model, what sort of framerate do you get with a modest GPU running the (non rendering) test?

rigolepe commented 4 years ago

Running the multi-pose benchmark cycling 1000 times through the example images on a Geforce GTX 1080ti gives these average FPS using TF 2.0.0:

ResNet50 stride 16:  32.41 FPS
ResNet50 stride 32:  38.70 FPS (strange this is faster than with stride 16)
MobileNet stride 8:  37.90 FPS   (surprisingly slow for mobilenet, ran this several times, same result)
MobileNet stride 16:  58.64 FPS  

I can't explain why the larger stride gives a faster result. It was expected that MobileNet would be faster than ResNet50, but the MobileNet quality is visibly lower on the rendered images (running image_demo.py). I see that you were in quite a faster range (90-110fps) with MobileNet on an identical GPU, but that was probably for a single pose detection?

The MobileNet test above is done with the 101 model (multiplier = 1.0), lowering the multiplier and the quant bytes with stride 16 gives:

multiplier = 0.5 and quant_bytes = 2 --> 151.35 fps
multiplier = 0.5 and quant_bytes = 4 --> 70.04 fps
multiplier = 0.75 and quant_bytes = 4 --> 124.46 fps

Lowering the quant bytes on ResNet50 doesn't improve the speed.

rwightman commented 4 years ago

@rigolepe thanks for the in depth numbers, the mobilenet numbers are a bit odd, I was seeing closer to double the performance for mobilenet 101 in the TF1 version, perhaps TF2 adds that much overhead? That was multi-pose, default args, as the application I built it for had no use for single. Running on either a 1080ti or 1080 back then.

Regarding the striding that makes sense. The network stride means the output feature maps are roughly 1/stride of the input size (with typical rounding/flooring along the way that depends on the padding settings). So a stride 8 network has much larger feature maps through the later layers than a standard stride 32 (default for most of the backbone networks when trained on imagenet at 224x224).

rwightman commented 4 years ago

BTW, something that can easily be done in TF 2, that would have been a pain in the ass in TF 1 graph land is an optimization I made for my PyTorch port, keeping the scores tensor on the GPU a bit longer and running build_part_with_score on the GPU

See: https://github.com/rwightman/posenet-pytorch/blob/master/posenet/decode_multi.py#L27

willfu commented 4 years ago

It seems the python version of resnet is much slower than the javascript one, is there any clue for it? I just use the webcam_demo.py to have a try.

aalzooke commented 4 years ago

anyone has the resnet50 model?

rigolepe commented 4 years ago

It is downloaded automatically when you choose resnet50 as model in my updated version of this repository, see: https://github.com/atomicbits/posenet-python/blob/master/image_demo.py

festapp86 commented 4 years ago

It is downloaded automatically when you choose resnet50 as model in my updated version of this repository, see: https://github.com/atomicbits/posenet-python/blob/master/image_demo.py

Thanks for your terrific work @rigolepe When I am running image_demo.py with resnet50 as a model everything works as expected. Unfortunately the keypoints and the skeleton are not as accurate as I was hoping. Is this due to the usage of CPU instead of GPU?

rigolepe commented 4 years ago

No, the choice of CPU vs GPU shouldn't make any difference on the accuracy. All network operations are the same in both cases, just executed slower on the CPU.

I also want to point out that most of the hard work was done by @rwightman I merely did the structural upgrades to support the new model formats and support for TF2.x

rwightman commented 4 years ago

@festapp86 please see the conversation in this old issue: https://github.com/rwightman/posenet-python/issues/11 ... it's a good summary of things to look at, the input images are important, their resolution, scale, brightness, etc... in live situations even something simple like turning on a light can greatly increase the accuracy

festapp86 commented 4 years ago

@rwightman Good to know, really appreciate your knowledge sharing. Did you ever try to convert the saved_model.pb into a .tflite model? I am able to convert it but when I try to inference the .tflite model in a different client application, there is no recognition. Might that be an issue regarding image resolution, scale, etc? @rigolepe Did you reuse the saved_model.pb in some other context?

DmytroUsenko commented 4 years ago

@rwightman @rigolepe @rwightman hey guys, do you have the same implementation only for Tflite?

gautamw3 commented 4 years ago

It is downloaded automatically when you choose resnet50 as model in my updated version of this repository, see: https://github.com/atomicbits/posenet-python/blob/master/image_demo.py

When I am installing the dependencies I am getting these: ERROR: tensorflowjs 1.4.0 has requirement tensorflow==1.15.0, but you'll have tensorflow 2.1.0 which is incompatible. ERROR: tensorflowjs 1.4.0 has requirement tensorflow-hub==0.5.0, but you'll have tensorflow-hub 0.7.0 which is incompatible. ERROR: tensorflow 2.1.0 has requirement six>=1.12.0, but you'll have six 1.11.0 which is incompatible.

When try to fix one of them for example, pip install six==1.12.0 I am getting: ERROR: tensorflowjs 1.4.0 has requirement six==1.11.0, but you'll have six 1.12.0 which is incompatible. ERROR: tensorflowjs 1.4.0 has requirement tensorflow==1.15.0, but you'll have tensorflow 2.1.0 which is incompatible. ERROR: tensorflowjs 1.4.0 has requirement tensorflow-hub==0.5.0, but you'll have tensorflow-hub 0.7.0 which is incompatible.

And when I run: python image_demo.py --model resnet50 --stride 16 --image_dir ./images --output_dir ./output I am getting this: Tensorflow version: 2.1.0 Loading ResNet50 model Cannot find tf model path ./_tf_models/posenet/resnet50_float/stride16, converting from tfjs... Traceback (most recent call last): File "image_demo.py", line 49, in main() File "image_demo.py", line 33, in main posenet = load_model(model, stride, quant_bytes, multiplier) File "/home/gautam/PycharmProjects/resnet_pose_detect/posenet-python/posenet/posenet_factory.py", line 22, in load_model tfjs2tf.convert(model_cfg) File "/home/gautam/PycharmProjects/resnet_pose_detect/posenet-python/posenet/converter/tfjs2tf.py", line 29, in convert graph = tfjs.api.load_graph_model(model_cfg['tfjs_dir']) AttributeError: module 'tfjs_graph_converter' has no attribute 'api'

Can someone please help me understand why this is happening? I am not using docker configuration and my machine has no GPU support.

rigolepe commented 4 years ago

It looks like you are manually installing all the dependencies. You get this error when installing https://github.com/patlevin/tfjs-to-tf.git because it depends on tensorflowjs 1.4.0 which fetches tensorflow==1.15.0.

See my Dockerfile for how to install tfjs-to-tf without fetching all its dependencies.

gautamw3 commented 4 years ago

@rigolepe thanks for the quick response dear. I really appreciate that. I have installed all the dependencies as described in the instructions by you there:

  1. pip install opencv-python
  2. pip install -r requirements.txt
  3. Cloning the repo : https://github.com/patlevin/tfjs-to-tf parallel to the posenet-python directory and then running the command from within tfjs-to-tf directory: pip install . --no- deps Is it necessary to set up the things here using Docker? Actually I have no idea about Docker at all.

Thanks

rigolepe commented 4 years ago

No, you don't need to use docker, you can also install it manually on your system using the commands in the Dockerfile. That should work just as well.

rohit-bhatia commented 4 years ago

Hi,

When i try to load mobilenet stride 16 quant 4 multiplier .75 using this repo -https://github.com/atomicbits/posenet-python/blob/master/image_demo.py I got different results from the same model mobilenet v1 stride 16 multiplier 75 loaded through repo by @rwightman For the same image with same resolution having scale factor 1.

Can anyone tell me why there is difference between accuracy of keypoints while loading the same model through tf1 and tf2.