serviceberry3 / videopose3d_android

Porting this (https://github.com/facebookresearch/VideoPose3D) 3D human pose estimation model into Android for inference in the wild (on live video in real time). My goal is to use TfLite Posenet to get the 2D human keypoints and then feed them into the 3D model. Using PyTorch.
14 stars 3 forks source link

Inference not working. #1

Open luca992 opened 3 years ago

luca992 commented 3 years ago

I tried following your setup instructions and running on my phone but the pose tracking didn't appear to be working at all (not just very slowly as you mentioned in the readme). Is that expected with the current state of the project?

Also you mentioned "substituting VideoPose3D for a lighter, faster model" I'm curious what do you mean by that?

Really cool demo project btw, looks like you are getting pretty close!

serviceberry3 commented 3 years ago

@luca992 Hi, thanks for your interest! What exactly are you seeing when you run it? It's so slow that I didn't even bother drawing the human in the grid, but you should see a little camera preview up in the top that's very laggy. That's it. If that's not happening, make sure you've given the app all of the permissions it requested in settings, and you might need to download a fresh copy of OpenCV 3.4.1, because I think I omitted some of the library (big files) from this repo. Also I've made changes, so now you need to run trace_model_cpu_og.py instead of trace_model. Look at the 11/13/20 update for instructions. Let me know if it's still not working. I'll put a screenshot of the expected behavior on the README page.

serviceberry3 commented 3 years ago

@luca992 Regarding the second part of your question, I mean that there are other 2d-3d pretrained models out there, some of which look to be lighter. For example, the model from this paper is what I'm working on converting into an Android app now. If you can train a model yourself, it looked like you could train VideoPose3D using a smaller architecture based on their instruction page. But I have a feeling it would still be too slow for Android. What device are you using?

luca992 commented 3 years ago

@luca992 Hi, thanks for your interest! What exactly are you seeing when you run it? It's so slow that I didn't even bother drawing the human in the grid, but you should see a little camera preview up in the top that's very laggy. That's it. If that's not happening, make sure you've given the app all of the permissions it requested in settings, and you might need to download a fresh copy of OpenCV 3.4.1, because I think I omitted some of the library (big files) from this repo. Also I've made changes, so now you need to run trace_model_cpu_og.py instead of trace_model. Look at the 11/13/20 update for instructions. Let me know if it's still not working. I'll put a screenshot of the expected behavior on the README page.

Oh okay, yeah I got the same result. I was using trace_model.py, I'll try trace_model_og.py. Also, just making sure, posenet_model.tflite in MainActivity refers to https://storage.googleapis.com/download.tensorflow.org/models/tflite/posenet_mobilenet_v1_100_257x257_multi_kpt_stripped.tflite right?

luca992 commented 3 years ago

@luca992 Regarding the second part of your question, I mean that there are other 2d-3d pretrained models out there, some of which look to be lighter. For example, the model from this paper is what I'm working on converting into an Android app now. If you can train a model yourself, it looked like you could train VideoPose3D using a smaller architecture based on their instruction page. But I have a feeling it would still be too slow for Android. What device are you using?

I'm using a pixel 4 xl. I'm curious have you checked to make sure that the 2d posedection is performing well before performing the 3d inference using VideoPose3D? ... I'm just curious if swapping out using posenet for the 2d pose detection in available in ML Kit would help with performance because it's already optimized

serviceberry3 commented 3 years ago

Also, just making sure, posenet_model.tflite in MainActivity refers to https://storage.googleapis.com/download.tensorflow.org/models/tflite/posenet_mobilenet_v1_100_257x257_multi_kpt_stripped.tflite right?

That's right.

I'm curious have you checked to make sure that the 2d posedection is performing well before performing the 3d inference using VideoPose3D?

Yeah, I worked with Posenet a bunch before making this, and I also log the time each Posenet inference takes...it's about 15-20 ms per inference. I don't know much about ML Kit, but at a glance it looks like it would be comparable.

luca992 commented 3 years ago

Yeah, you're right, the 2d part is performing fine. I'm gonna try playing with the 2d to 3d part to see if I can get the hardware acceleration working... You say a little 3d human model is supposed to be displayed on that 3d grid after inference, right? Any idea why that isn't working, albeit slowly?

serviceberry3 commented 3 years ago

@luca992 No 😂 I drew the grid, and then when I realized how slow the inference was, I didn't bother writing the rest of the OpenGL code to draw the 3d human...you can try it if you want. I think it should, however, already be drawing the 2d key points inside of the camera preview box in the corner. That's done in onCameraFrame() in MainActivity. If you want to go implement the drawHuman() function in drawer.cpp, do it! That should I guess get called by glesRender(), which is called in onDrawFrame() callback in MainActivity. The grid should really only be drawn once at the beginning, so you might want to change that. Also, you'll notice in onCameraFrame() in MainActivity that I only do inferences every other frame; you can always change that. But right now it's far too slow to be used for anything. Let me know what you discover! So far for hardware acceleration I've tried MACE and Pytorch's new NNAPI conversion, both to no avail.

luca992 commented 3 years ago

Hahaha, okay good to know. The code is just bit organized so I'm making my way through it still.

serviceberry3 commented 3 years ago

@luca992 Also, what's your use case (if any)? If you can run on a solid desktop computer, the real-time visualization is fast. There are instructions in the README for that. 3d_vis_realtime.py

luca992 commented 3 years ago

Basically for tracking body pose while exercising. I wanted to see if I could get 3d pose tracking working on Android. On ios it's working well already with apple's stock sdk.

And yeah I saw that, haven't tried it out yet but I'll take a look 👍

serviceberry3 commented 3 years ago

@luca992 This was where I got most of the code for 3d_vis_realtime.py. They used detectron2 I think, which I substituted for Posenet. detectron2 is slower but more accurate I think.

EDIT: actually it looks like they used some other 2d trackers. Overall it seemed like whatever they used gave more accurate results than my version.

Other resources: https://github.com/Daniil-Osokin/lightweight-human-pose-estimation-3d-demo.pytorch https://github.com/tensorboy/pytorch_Realtime_Multi-Person_Pose_Estimation

serviceberry3 commented 3 years ago

@luca992 A while ago I was also experimenting with using OpenCV functions like solvePnP() to just estimate some joint angles using Posenet. See here, I just put up a preview of what it might look like. Not sure how full-body you need, required accuracy, etc. There are also lots of papers online about measuring joint angles and about 2d->3d neural networks.

Wow, I didn't even know about the ios 3d pose application...it's really surprising that there's really no app yet like that for Android..at least not that I've come across.

luca992 commented 3 years ago

Never got it working.... but looks like pytorch nnapi support is slowly getting better:

I just tried running your nnapi script with pytorch: torch-1.9.0.dev20210309, with a few hacky workarounds...(that might work?) and it's getting further than the nightly I tried in January and failing at Exception: Unsupported node kind ('aten::batch_norm') now

(venv) (base) luca@Lucas-MacBook-Pro videopose3d_android % PYTHONPATH=. python trace_model/trace_model_nnapi.py
quantize_core false
/Users/luca/Projects/videopose3d_android/common/model.py:201: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  assert x.shape[-2] == self.num_joints_in
/Users/luca/Projects/videopose3d_android/common/model.py:202: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  assert x.shape[-1] == self.in_features
/Users/luca/Projects/videopose3d_android/common/model.py:208: TracerWarning: Converting a tensor to a NumPy array might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  x = torch.from_numpy(x.numpy().transpose(0, 2, 1))
/Users/luca/Projects/videopose3d_android/common/model.py:208: TracerWarning: torch.from_numpy results are registered as constants in the trace. You can safely ignore this warning if you use this function to create tensors out of constant variables that would be the same every time you call this function. In any other case, this might cause the trace to be incorrect.
  x = torch.from_numpy(x.numpy().transpose(0, 2, 1))
Traceback (most recent call last):
  File "trace_model/trace_model_nnapi.py", line 131, in <module>
    make_videopose3d_nnapi(Path(os.environ["HOME"]) / "mobilenetv2-nnapi", quantize_mode)
  File "trace_model/trace_model_nnapi.py", line 89, in make_videopose3d_nnapi
    nnapi_model = torch.backends._nnapi.prepare.convert_model_to_nnapi(traced, input_tensor)
  File "/Users/luca/Projects/videopose3d_android/venv/lib/python3.8/site-packages/torch/backends/_nnapi/prepare.py", line 169, in convert_model_to_nnapi
    ser_model, used_weights, inp_mem_fmts, out_mem_fmts = serialize_model(model, inputs)
  File "/Users/luca/Projects/videopose3d_android/venv/lib/python3.8/site-packages/torch/backends/_nnapi/serializer.py", line 1375, in serialize_model
    return _NnapiSerializer(config).serialize_model(module, inputs)
  File "/Users/luca/Projects/videopose3d_android/venv/lib/python3.8/site-packages/torch/backends/_nnapi/serializer.py", line 520, in serialize_model
    self.add_node(node)
  File "/Users/luca/Projects/videopose3d_android/venv/lib/python3.8/site-packages/torch/backends/_nnapi/serializer.py", line 643, in add_node
    raise Exception("Unsupported node kind (%r) in node %r" % (node.kind(), node))
Exception: Unsupported node kind ('aten::batch_norm') in node %input.3 : Tensor = aten::batch_norm(%512, %self.expand_bn.running_var, %self.expand_bn.running_mean, %self.expand_bn.running_mean, %self.expand_bn.running_var, %195, %208, %207, %194), scope: __module.expand_bn # /Users/luca/Projects/videopose3d_android/venv/lib/python3.8/site-packages/torch/nn/functional.py:2149:0
serviceberry3 commented 3 years ago

@luca992 Thanks for the info! Yeah, a while back I posted on Pytorch forums about it here. I guess it's just that some layers aren't supported for NNAPI yet. Eventually we'll get there...