Closed hongsukchoi closed 3 years ago
For each activity, 2400 frames are uniformly sampled. If there are two sequences from the same activity, 1200 frames of each sequences are sampled. For example, [160226_haggling1] and [160422_haggling2] both provide 1200 frames.
Could you also tell me what joints did you use for evaluation? Some sequences have 19 coco joint annotations, but some have 15 mpii joint annotations. Did you use different joints per different sequences?
15 mpii joint annotations for all sequences.
wow I really appreciate your prompt answer!!!
Hi, kudos for the great work!
To have this clear, do you train your model with the 15 mpii joints and therefore with only VGA images and not HD images? In the dataset web page 15 mpii joints are associated with VGA images.
Also, in case you used HD images. There seems to be a little difference between annotations from 15 mpii vs. coco19 in terms of translation and body rotations. Have you done anything to handle this?
Thanks in advance!
@nicolasugrinovic We use HD images with mpi15 annotations. The mpi15 annotations are provided by CMU for these videos (download via scripts), at least at that moment. We have provided our conversion code from coco to mpi15 (lib/preprocess/create_annot).
Great, thank you very much for the prompt reply.
Hi, One question regarding the image processing here. When you rescale the images to size 832×512, do you directly rescale them or do you keep the aspect ratio and pad the image?
Would really appreciate your response. Thanks!
Hi, it mentioned that "2400 frames are uniformly sampled". However, not all the number of activity are multiples of 2400. For fair comparion, could provide the index of frame? or Could provide more detail about the sampling rule?
I finally found the detailed configuration for the CMU panoptic. Thanks! https://github.com/zju3dv/SMAP/issues/1#issuecomment-a
However, still it is unclear which images you used for training and testing. The paper says
Following [41], we choose two cameras (16 and 30), 9600 images from four activities (Haggling, Mafia, Ultimatum, Pizza) as our test set, and 160k images from different sequences as our training set.
As far as I know, there are much more image frames than 9600 in the test sequences you listed here https://github.com/zju3dv/SMAP/issues/1#issuecomment-a.
I hope you can tell the specific frame sampling configuration!