Can I use a pre-trained model to predict the people in a live camera feed? - Githubissues

mmahdavian / STPOTR

Human Pose and Hip Trajectory Prediction Using Transformers

GNU General Public License v3.0

11 stars 2 forks source link

Can I use a pre-trained model to predict the people in a live camera feed? #7

Closed y1131388949 closed 4 months ago

y1131388949 commented 5 months ago

I want to connect a usb camera to make predictions about the people in the footage captured by the camera what should I do? Or how do I do it if I capture a video and make a prediction offline?

mmahdavian commented 4 months ago

@y1131388949 Hi. I think you need get familiar with ROS. It can help you do a large range of robotic tasks including using cameras to capture or record data and use the send data to the ML models and basically make a platform for transferring data in between different codes (nodes). You can start from the official ROS tutorial:

https://wiki.ros.org/ROS/Tutorials

mmahdavian commented 4 months ago

@y1131388949 and to answer you other question (Can I use a pre-trained model to predict the people in a live camera feed?) Yes. The algorithm is real-time and you can use it in a live camera feed. You just need to take the data from the camera and send it to the ML node using ROS.

y1131388949 commented 4 months ago

I also possess a ZED binocular camera. Precisely where in the project code should modifications be made to enable predictions on the footage captured by the camera? I intend to conduct a real-time demo test using the pre-trained model you've provided, specifically for predicting human posture and trajectory. How should I adjust the runtime configuration?

mmahdavian commented 4 months ago

@y1131388949 you need to write down a ros node to collect the human poses that the camera makes and then make it to a sequence with 0.1 second frame difference and then input that to the model after normalizing. This code is not provided with the package. You need to write it down yourself.

y1131388949 commented 4 months ago

Okay, thanks a lot. What algorithm do you use to collect human postures for the robot following work you show in the YouTube video?

mmahdavian commented 4 months ago

@y1131388949 I use the human skeleton generated by ZED2 camera. But you can use any human pose estimation algorithm.

y1131388949 commented 4 months ago

Thank you for your patience. If I want to use a pre-trained model for prediction, do I need to follow the H36M distribution of human critical points? Or I can just use zed directly generated human keypoints for testing.

mmahdavian commented 4 months ago

@y1131388949 You need to create a sequence of 5 frames with 0.1 sec difference between them. Then you need to normalize the values and feed to the model. Then you need to denormalize the model output and get the final results. i think if you use our testing code, it calculates the normalization matrices, but here are normalization matrices:

    self.norm_stats['mean']=np.array([ 0.00660166, -0.00322862, -0.00074547, -0.00221925,  0.32131838,
         0.05040703, -0.01361359,  0.69785828,  0.09532178, -0.00660162,
    0.00322858,  0.00074546, -0.01506641,  0.32316435,  0.05134183,
   -0.02408792,  0.70626347,  0.09823843,  0.00577709, -0.21263408,
   -0.02852573,  0.01207891, -0.43795797, -0.05560767,  0.01407008,
   -0.49628542, -0.05891722,  0.01702867, -0.58822308, -0.07295712,
    0.00417502, -0.38859674, -0.04970666, -0.0071068 , -0.19491813,
   -0.02284074, -0.00568425, -0.13399005, -0.00880117,  0.0170771 ,
   -0.38437933, -0.04920271,  0.01767753, -0.19377204, -0.02285681,
    0.01422233, -0.16066915, -0.01378928])
    self.norm_stats['std'] = np.array([0.09752762, 0.02463142, 0.08864842, 0.18086745, 0.19258509,
   0.19703887, 0.2124409 , 0.24669765, 0.24913149, 0.09752732,
   0.02463133, 0.08864804, 0.18512949, 0.20038966, 0.20181931,
   0.21314536, 0.25163   , 0.25110496, 0.05994541, 0.05127974,
   0.07128643, 0.11012195, 0.09926957, 0.13492376, 0.13778828,
   0.12397334, 0.16564003, 0.14164487, 0.12879104, 0.17520238,
   0.14193176, 0.09618481, 0.15208769, 0.2274719 , 0.13292273,
   0.22023694, 0.24284202, 0.22032088, 0.24409384, 0.1440386 ,
   0.09937461, 0.15516004, 0.23021911, 0.14233924, 0.22319722,
   0.24973414, 0.23667015, 0.25038966])
    self.norm_stats['mean_traj']=np.array([-0.3049573 , -0.24428056,  2.96069035])
    self.norm_stats['std_traj'] = np.array([1.53404053, 0.33958351, 3.70011473])