Closed pcp16 closed 4 years ago
For the full human body crops, we first run SSD for human detection and then crop the image patches re-sized to 224x224. We do not use the LCRNet 2D skeletons as they are a bit noisy. Toyota takes the right to publish the scripts for our paper (Separable STA). They are already fully prepared with our pre-trained models. We are waiting for legal permissions to open-source them.
Thank you for your prompt response! Looking forward to it.
The scripts that load the Toyota Smarthomes dataset expect images (retrieved via "glob()") from the videos, rather than the published mp4 videos themselves. I assume these images have been preprocessed according to "Quo vadis?" (3D convnets) protocol (resizing so that smaller side is 256). I also assume the resized images are actually patches of the full human body, trimmed using the 2D information from LCRnet to calculate a bounding box around the corresponding skeleton. Is there anything else done to them? Are you willing to publish the script that generates the images from the mp4 videos of the dataset? Thank you for your response.