BS3D: Building-scale 3D Reconstruction from RGB-D Images

The BS3D dataset and the reconstruction framework presented in:
BS3D: Building-scale 3D Reconstruction from RGB-D Images [arXiv]

1. BS3D dataset

The BS3D dataset can be downloaded from [here]. The following sections describe the contents of the dataset.

1.1 Campus reconstruction (2 Hz)

The main reconstruction is under the campus subdirectory. Images are provided in two coordinate frames: color camera and depth (infrared) camera. For this part, lasers scans were not captured. There are 19981 images which have been rectified. A filename corresponds to the timestamp in seconds.

Color camera frame

Type	Resolution	Format	Description	Identifier
Color images	720x1280	24-bit JPG	Rectified color images.	color
Depth maps	720x1280	16-bit PNG	Sensor depth in millimeters. Invalid depth equals 0.	depth
Depth maps (rendered)	720x1280	16-bit PNG	Depth rendered from the mesh in millimeters. Invalid depth equals 0.	depth_render
Normal maps (rendered)	720x1280	24-bit PNG	Surface normals rendered from the mesh. Invalid normal equals (0,0,0).	normal_render
Camera poses		TXT	Color camera poses (camera-to-world) in the RGBD SLAM format: timestamp, tx, ty, tz, qx, qy, qz, qw	poses
Camera calibration		YAML	Color camera intrinsics and extrinsics between color and infrared camera.	calibration

Depth camera frame

Type	Resolution	Format	Description	Identifier
Color images	1024x1024	24-bit JPG	Color images transformed to the depth camera frame.	color
Infrared images	512x512	16-bit PNG	Active infrared images.	infrared
Depth maps	512x512	16-bit PNG	Raw sensor depth in millimeters. Invalid depth equals 0.	depth
Depth maps (rendered)	512x512	16-bit PNG	Depth rendered from a mesh in millimeters. Invalid depth equals 0.	depth_render
Normal maps (rendered)	512x512	24-bit PNG	Surface normals rendered from a mesh. Invalid normal equals (0,0,0).	normal_render
Point clouds		PLY	Point cloud data (X,Y,Z) including infrared intensity.	clouds
Camera poses		TXT	Depth camera poses (camera-to-world) in the RGBD SLAM format: timestamp, tx, ty, tz, qx, qy, qz, qw	poses
Camera calibration		YAML	Depth camera intrinsics and extrinsics between color and infrared camera.	calibration

Inertial measurements

Type	Rate	Format	Description	Identifier
IMU data and calibration	1.6 kHz	CSV, YAML	Accelerometer (m/s^2) and gyroscope readings (rad/s) sampled at 1.6 kHz. Format: stamp, wx, wy, wz, ax, ay, az. Calibration includes IMU-camera extrinsics (e.g. between gyroscope and color camera).	imu

Surface reconstruction

Type	Format	Description	Identifier
Mesh	PLY	Mesh created from raw depth maps using scalable TSDF fusion (Open3D library). No color information.	mesh

Raw recordings

Raw recordings are in the mkv directory. There are 47 recordings (6.7GB - 11.6 GB each) which you can extract using preprocess-mkv.exe (Section 2). You may want to discard a few seconds at the beginning/end of the recording since the device is stationary.

1.2 Lobby reconstruction with laser scans (2 Hz)

A reconstruction of a lobby and corridors is under the lobby subdirectory. Data is organized as described above. There are 6618 images in total. In addition, the data includes laser scans that were obtained using FARO 3D X 130.

Laser scans

Type	Format	Description	Identifier
Original scans	PLY	Original laser scans (point clouds) which have been registered. Clouds have not been cleaned or downsampled.	laserscans_original
Cleaned scan	PLY	A single point cloud that has been cleaned and downsampled.	laserscan

1.3 Odometry sequences (30 Hz)

Sequences used in the visual-inertial odometry experiments. See the tables above for the description of the data. Note that lasers scans were not captured.

Sequence	Duration (s)	Length (m)	Dimensions (m)
cafeteria	200	90.0	12.4 x 15.7 x 0.8
central	242	155.0	25.5 x 42.1 x 5.3
dining	192	109.2	33.8 x 25.0 x 5.5
corridor	174	77.6	31.1 x 4.7 x 2.4
foobar	75	37.1	5.4 x 14.4 x 0.6
hub	124	52.3	11.4 x 5.9 x 0.7
juice	103	42.7	6.3 x 8.6 x 0.5
lounge	222	94.2	14.4 x 10.3 x 1.1
study	87	40.0	5.6 x 9.8 x 0.6
waiting	139	60.1	9.8 x 6.7 x 0.9

2. Reconstruction framework

Follow these instructions to reconstruct your environment. This repository includes a template dataset datasets/mydataset with necessary configuration files and folder structure.

2.1 Prerequisites

This software has been tested on Windows 10, but it should be compatible with Ubuntu 18.04.

Install Azure Kinect SDK from here (version 1.4.1, latest)
Install RTAB-Map from here (version 0.20.16, latest)
Install Preprocess-MKV (instructions below)

Clone the repository:

git clone https://github.com/jannemus/BS3D.git
cd BS3D

2.2 Install Preprocess-MKV

Preprocess-MKV is needed for extracting and processing the MKV files captured using Azure Kinect. Make sure you have installed the Azure Kinect SDK (see prerequisites). You also need OpenCV 4.3.0 (or later) and CMake 3.18.2 (or later).

In the following example, Visual Studio 2017 is used to compile Preprocess-MKV. Open the Visual Studio command prompt (Start -> VS2015 x64 Native Tools Command Prompt). To compile:

mkdir preprocess\build
cd preprocess\build
cmake -G"Visual Studio 15 2017 Win64" ..
cmake --build . --config Release --target install

2.3 Data capture

Azure Kinect SDK includes a recorder application (k4arecorder.exe) that is called from record.py. Record one or more sequences by running:

python record.py output.mkv

Put your recordings (e.g. A1.mkv, A2.mkv, ...) to the mydataset/mkv folder.

Capturing tips

To encourage loop closure detection, start and end the recording from a view that has plenty of visual features (corners etc.).
During recording, it is good to revisit locations, especially those that are rich in visual features.
Although Azure Kinect has a fairly good depth range and FoV, avoid pointing the camera towards a view that has insufficient geometry (e.g. large and completely empty lobby or corridor).

2.4 Preprocess MKVs

Extract images (color, depth, infrared), inertial measurements, point clouds, and calibration information from the MKV files using preprocess.py. The code will also undistort the images and perform color-to-depth alignment (C2D). The command:

python preprocess.py datasets/mydataset

will process all MKV files and write data to mydataset/preprocessed/*/, where is the name of the MKV file. RTAB-Map configuration files will also be written to mydataset/rtabmap/*.

Note If you just want to extract data, you can provide arguments --undistort false and --c2d false.

2.5 Single-session mapping

Launch RTAB-Map and load configuration from mydataset/rtabmap/*/single-session-config.ini, where is the session name.
`Preferences -> Load settings (.ini)` This will automatically set paths to calibration, color images and depth maps.

Initialize database File -> New database and press start. After the reconstruction has finished, check that the map looks good. If it does, close the database (.db) to save it to mydataset/rtabmap/*/map.db If you have multiple sessions, process and save each of them. Make sure you name each database map.db.

If you only have a single session, export camera poses to mydataset/rtabmap/poses.txt
File -> Export poses -> RGBD-SLAM format (*.txt) -> Frame: Camera
After that, continue to Sec. 1.7 Surface reconstruction.

2.6 Multi-session mapping

In RTAB-Map, load configuration mydataset/rtabmap/multi-session-config.ini

Select all single-session databases:
Preferences -> Source -> Database > [...] button

Note that the order in which the databases are processed matters (A1.db, A2.db, ..., C3.db). For example, the sequence C3.db should overlap at least one of the earlier sequences (A1.db, A2.db, ...).

Initialize database File -> New database and press start. After the reconstruction has finished, check that the map looks good. If it does, export camera poses to mydataset/rtabmap/poses.txt
File -> Export poses -> RGBD-SLAM format (*.txt) -> Frame: Camera

Optionally you can perform post-processing to detect more loop closures:
Tools -> Post-processing -> OK (default settings)
after which you need to export poses again.

2.7 Surface reconstruction

Perform surface reconstruction using TSDF fusion:

python meshing.py datasets/mydataset

The output mesh (.ply) will be written to mydataset/mesh by default.

2.8 Render images

Render depth maps and surface normals from the mesh:

python render.py datasets/mydataset

The output data will be written to mydataset/render by default.

Citation

If you use this repository in your research, please consider citing:

@article{mustaniemi2023bs3d,
  title={BS3D: Building-scale 3D Reconstruction from RGB-D Images},
  author={Mustaniemi, Janne and Kannala, Juho and Rahtu, Esa and Liu, Li and Heikkil{\"a}, Janne},
  journal={arXiv preprint arXiv:2301.01057},
  year={2023}
}

jannemus / BS3D

readme