Closed shubhamsd4 closed 3 years ago
In the current implementation everything is send through the network sequentially, thus the performance breaks down quiet fast. To make it viable for multiple human bounding boxes you can just send the bounding boxes as batches through the network, instead of sequentially. To do so you have to change the code in get_humans_from_bbs() in tne pose_resnet network as well as the get_human method in utils/pose_resnet_inference. Currently it works on a single bb and sends a batch of size 1 through the network (see net_input = net_input.unsqueeze(0)), instead of this you can preprocess all bbs and add send all of them through the network as one batch.
I am currently developing a new network with various improvements, thus I don't have time to change the EHPI code. The new network currently supports simultaneous 2D / 3D Pose estimation as well as body / head orientation estimation. It is running at ~15 FPS with 10+ People in a video on a RTX3080. It is working framewise right now, I need to add temporal data as well as the action recognition into the network before releasing it.
@noboevbo Hi, can I work with you on this new network? I am currently doing my internship exactly on this subject and I am reading myself into the subject. The goal is to implement human-pose action recognition inside security camera's which are made by the company itself. Would like to get some help from you, but also would like to help you! Currently the one guy which should help me has some kind of Corona down period, so he rarely shows up. (actually have spoken to him 10 min, while this internship has started almost 2 weeks ago).
Also, as I am reading into this subject, the repo specific jargon (e.g. ResNetV3, NTU-RGBD) is making it hard to read and understand what the actual general jargon is. Is there some kind of glossary for this general jargon?
Got it, thanks:)
Since the Top-Down approach is being used for human-pose estimation, the computational power required will increase as the number of people increase, will this model be still good enough for real-time applications?
PS: I Will try this out too, but just asking if someone here has tried it out.