ykotseruba / PedestrianActionBenchmark

Code and models for the WACV 2021 paper "Benchmark for evaluating pedestrian action prediction"
https://openaccess.thecvf.com/content/WACV2021/papers/Kotseruba_Benchmark_for_Evaluating_Pedestrian_Action_Prediction_WACV_2021_paper.pdf
MIT License
54 stars 17 forks source link

About pose data #4

Closed xingchenzhang closed 3 years ago

xingchenzhang commented 3 years ago

Hi,

Thank you very much for your very intersting and excellent work!

I am wondering could you please provide some explanations about the pose data you provided, for example the data in the features/jaad/poses folder?

In the paper you mentioned you extract pose data using OpenPose, are the pose data (18 joints) included there? In addition, what is the meaning of 'pose_set01'?

Many thanks! Looking forward to your reply!

Bests, Xingchen

ykotseruba commented 3 years ago

Hi Xingchen, The videos in PIE and JAAD are grouped into sets. There are 6 sets in PIE and 1 set in JAAD. For each set there are corresponding pickle files with pose information. Poses are saved as dictionaries with the following structure:

├── set_01
│   ├── video_0001
│   │   ├──00379_0_1_2 # frame and pedestrian id
│   │   │   │   ├──[x1, y1, x2, y2, ..., x18, y18] # joint coordinates
│   │   ├──00380_0_1_2
│   │   ├──...
│   ├── video_0003
│   │   ├──00040_0_3_7
│   │   ├──00041_0_3_7
│   │   ├──...
│   ├── ...

Basically for every frame in each video we compute joint coordinates for all pedestrians with bounding boxes. The keys for pedestrians consist of 2 parts: frame number and pedestrian id. In the example for video_0003 above, '00040_0_3_7' means frame 40, pedestrian id "0_3_7" (which itself means set 0, video 3, pedestrian #7 in that video). There are 18 joint coordinates for each pedestrian, normalized between 0 and 1, so 36 floats total. If the pose cannot be determined (e.g. pedestrian is fully occluded or too far), all joint coordinates are set to 0.

xingchenzhang commented 3 years ago

Hi, @ykotseruba

Many thanks for your information, they are very helpful!

Another thing I want to double check with you is that, "we compute joint coordinates for all pedestrians with bounding boxes", here the pedestrians including those with and without behavior labels, right? Or in other words, as long as they have bounding boxes, you provide the pose data (although they might be all set to 0).

Thank you very much again!

Bests, Xingchen

ykotseruba commented 3 years ago

Correct, all pedestrians with bounding boxes have pose information.

xingchenzhang commented 3 years ago

Thank you very much @ykotseruba !

I will start to use your data. If I have other questions, I may contact you again.

Thank you very much again!

xingchenzhang commented 3 years ago

Hi Yulia,

Sorry for disturbing again.

I notice that in pose_set01.pkl of JAAD, there are pose data for 294 videos. Are these train (177) + test (117)? I am wondering have you provided pose data for val subset? In the code, data_val can read pose data, but I am not sure why in this case there are only data for 294 videos in pose_set01.pkl.

Thanks! Xingchen

ykotseruba commented 3 years ago

The pose information in this repo is sufficient to replicate the results of the benchmark. We provide poses for the training and test portion of the data. Some of the videos may be missing because of the insufficient quality or lack of long enough pedestrian tracks. Full pose information for both datasets is too large to upload to github, you can find it here for JAAD and PIE.

xingchenzhang commented 3 years ago

Thanks a lot!

xingchenzhang commented 3 years ago

Hi Yulia,

I have one more question about the pose data in this benchmark. Previously I have also tried using openpose to extract pose from JAAD dataset, but I found that the results were very bad. I am wondering how did you extract pose? Have you used special techniques to extract these pose information? BTW, have you checked the quality of these pose data?

Thanks a lot!

Bests, Xingchen

ykotseruba commented 3 years ago

We are using this code for pose estimation. It doesn't work well for pedestrians far away but is ok for those closer to the vehicle.

xingchenzhang commented 3 years ago

Hi,

Many thanks for providing the link of the pose estimation method, this is very helpful!

I am also wondering did you directly feed the video/frame to the pose estimation algorithm? Did you do any cropping (for example according to bouding box) and then feed the cropped images to the pose estimation algorithm?

The reason I am asking is that when I feed the whole frame to pose estimation algorith, for pedestrians which are far away, the performance is very bad. Maybe feeding cropped images can help this?

Thank you very much for your continuous help!

Bests, Xingchen

ykotseruba commented 3 years ago

Yes, OpenPose does not work well on the whole frame. We computed the poses for cropped images (using the bounding boxes). There are still some issues with poses of pedestrians that are partially visible or far away, but overall the data is usable.

xingchenzhang commented 3 years ago

Thanks a lot!