msavva / pigraphs

PiGraphs: Learning Interaction Snapshots from Observations
http://graphics.stanford.edu/projects/pigraphs
Other
45 stars 5 forks source link

This is the code and data repository for the SIGGRAPH 2016 technical paper PiGraphs: Learning Interaction Snapshots from Observations. If you use this code or data in your research please cite the paper:

@article{savva2016pigraphs,
    title={{PiGraphs: Learning Interaction Snapshots from Observations}},
    author={Manolis Savva and Angel X. Chang and Pat Hanrahan and Matthew Fisher and Matthias Nie{\ss}ner},
    journal = {ACM Transactions on Graphics (TOG)},
    publisher = {ACM},
    volume = {35},
    number = {4},
    year = {2016}
}

For more details about the project go to http://graphics.stanford.edu/projects/pigraphs/.

See the Command Overview wiki for an overview of the UI and a description of the available commands for synthesizing interaction snapshots.

We provide a set of pre-trained models in the learned-models.cached.zip archive. After downloading, extract to the repository root. To compile and run, you will need to download external dependencies package from mLibExternal.zip and also extract them to the repository root.

Compiling

Compiles with Visual Studio 2013. Main solution file is pigraphs.sln. Make sure working directory of project configurations is set to $(SolutionDir)bin\$(Configuration). You may need to copy missing dlls to the working directory path.

Default parameters are in $(SolutionDir)bin\parameters_default.txt. If $(SolutionDir)bin\parameters.txt exists, it acts as an override for any parameter settings contained in parameters_default.txt.

Prerequisites for compiling

Troubleshooting compilation and running

If you have trouble running on Windows 8.1 and you get a D3D11_CREATE_DEVICE_DEBUG error, you may need to install the D3D11 SDK Debug Layer with Windows 8.1 SDK: http://msdn.microsoft.com/en-us/windows/desktop/bg162891.aspx . See http://stackoverflow.com/questions/19121093/directx-11-missing-sdk-component-on-windows-8-1 for more details.

Regenerating proxy code for interfacing with Weka using Jace

Prepackaged VM image

A Windows VM with VS2013, a copy of the PiGraphs codebase and all dependencies installed is available here (67GB, expands to 128GB VHD format image). You can use this image to create a Microsoft Azure Container, or run it locally with VirtualBox using the vbox configuration file here. The user account password is PiGraphs2016.

Scan data documentation

Each scan has a filename with format sceneId.<ext> where <ext> is one of the extensions below.

.ply

Stanford PLY meshes of reconstructed environments. Scale is in meters and the up axis is +Z.

.grid.gz

Gzip compressed volumetric occupancy voxel grid for reconstructed environments. Stores aggregate occupancy information from scanning process in a dense voxel grid. Each voxel contains the total counts of being observed as free space, being observed as occupied (in TSDF truncation threshold), and being behind a surface (and it is therefore unknown whether it is occupied or free).

Occupancy grid file format (after gunzipping) starts with ASCII header lines of following format (ignore comments after # on each line):

voxelgrid\t1  # file format identifier and version number
dimensions\tdimX dimY dimZ  # vec3ul
worldToGrid\tfloat00 float01 float02 floa03 float10 ... float33  # mat4f
depthMin\tfloat             # minimum depth threshold used during scanning
depthMax\tfloat             # maximum depth threshold used during scanning
voxelSize\tfloat            # voxel dimension in meters
labels\tId1,Id2,Id3,...     # labels of voxels, currently ignored
data:                       # line indicating start of binary data
Binary section

Remaining binary section is sequence pairs of uint32 (freeCtr,occCtr) in iteration order x, y, z (outermost to innermost). The worldToGrid 4x4 matrix transforms from the corresponding .ply file coordinate frame to the voxel grid coordinate frame. Refer to libsg/core/OccupancyGrid.h and libsg/core/OccupancyGrid.cpp for example reader and writer functions.

.vox

Labeled sparse voxel grid for reconstructed environments where labels are object and object part categories. Format is very similar to occupancy grid, with following ASCII header:

labeledgrid\t1              # file format identifier and version number
dimensions\tdimX dimY dimZ  # vec3ul
worldToGrid\tfloat00 float01 float02 floa03 float10 ... float33  # mat4f
voxelSize\tfloat            # dimensions of voxels in meters
labels\tId1,Id2,Id3,...     # comma separated string labels will be referenced by 0-based integer index in binary data
numVoxels\tsize_t           # total number of voxels in this file
data:                       # line indicating start of binary data
Binary section

Binary section is sequence of 4-tuples of int16_t (x, y, z, label) where x, y, and z give the voxel integer coordinates, and label is an index into the comma separated string labels.

.segs.json

Segmentation and object labels for segments of .ply format meshes of reconstructed scenes. The file is in JSON format with the following fields:

params : record of segmentation parameters used
sceneId : id of scene PLY mesh that this segmentation corresponds to
segGroups[{},...,{}] : array of segmentation group records with fields `id`, `label`, `objectId`, `obb`, `dominantNormal` and `segments`.  The `label` field is the object and part label string, and `segments` is an array of segment ids which are grouped together and assigned the given label.  The `objectId` field is a unique id for each instance of labeled object (to disambiguate between multiple instances e.g., multiple chairs in a scene).
segIndices : array of integer segment ids for each vertex in the corresponding .ply mesh file.  Vertices are numbered in the order they are specified in the .ply file, and all vertices with the same segment id belong to one segment.  The ids are not neccessarily consecutive numbers.

Refer to the code in libsg/segmentation/SegmentGroup.cpp (particularly SegmentGroupRecord::load) for an implementation of a reading function.

Recording data documentation

The file basename has the format sceneId_cameraId_timestamp.<ext> where <ext> is one of file types below. The sceneId part corresponds to the reconstructed scene in which the recording was taken.

A recording consists of the set of files with the same basename and each following extension:

Video frames can be read from the video files using the OpenCV VideoCapture interface or any other video reading interface that can return frame values as raw byte arrays.

Both color and depth frame videos are encoded using the Lagarith lossless codec (FOURCC identifier LAGS). Color frames are standard BGR YUY2 frames while the depth frames encode both raw Kinect One depth values as well as tracked body segmentation mask in each pixel. Both these values are packed into three bytes per pixel as [R,G,B] = [body,depth0,depth1]. The OpenCV code snippet below gives an example of parsing a .depth.avi frame:

#include <opencv2/core.hpp>
void parseDepthFrame(const cv::Mat& depthAndBodyFrame) {
  // Split RGB channels
  cv::Mat channels[3];
  cv::split(depthAndBodyFrame, channels);
  // First channel is the body index mask (uchar values of 0xff correspond
  // to non-body pixels, anything else is a body id for each tracked person)
  const cv::Mat body = channels[0];
  // Next two channels are merged to get short encoding depth value in mm
  cv::Mat depthChs(512, 424, CV_8UC2);
  cv::merge(&channels[1], 2, depthChs);
  cv::Mat depth(depthChs.rows, depthChs.cols, CV_16UC1);
  memmove(depth.data, depthChs.data, 2 * depthChs.rows * depthChs.cols);
}

For convenience, we also provide example C++ code for loading the recording .json format with no external dependencies: loadrec.