sunset1995 / pytorch-layoutnet

Pytorch implementation of LayoutNet.
MIT License
171 stars 39 forks source link
computer-vision layoutnet pytorch pytorch-layoutnet room-layout

pytorch-layoutnet

News: Check out our new project HoHoNet on this task and more!\ News: Check out our new project HorizonNet on this task.

This is an unofficial implementation of CVPR 18 paper "LayoutNet: Reconstructing the 3D Room Layout from a Single RGB Image". Official layout dataset are all converted to .png and pretrained models are converted to pytorch state-dict.
What difference from official:

Overview of the pipeline:

Use this repo, you can:

Requirements

Visualization

1. Preparation

2. Pre-processing (Align camera pose with floor)

3. Layout Prediction with LayoutNet

4. Layout 3D Viewer

Preparation for Training

Training

See python train.py -h for detailed arguments explanation.
The default training strategy is the same as official. To launch experiments as official "corner+boundary" setting (--id is used to identified the experiment and can be named youself):

python train.py --id exp_default

To train only using RGB channels as input (no Manhattan line segment):

python train.py --id exp_rgb --input_cat img --input_channels 3

Gradient Ascent Post Optimization

Instead of offical 3D layout optimization with sampling strategy, this repo implement a gradient ascent optimization algorithm to minimize the similar loss of official.
The process abstract below:

  1. greedily extract the cuboid parameter from corner/edge probability map
    • The cuboid are consist of the 6 parameters (cx, cy, dx, dy, theta, h)
    • corner probability map edge probability map
  2. sample points alone the cuboid boundary and project them to equirectangular formatted corner/edge probability map
    • The sample projected points are visualized as green dot
  3. for each projected sample point, getting value by bilinear interpolation from nearest 4 neighbor pixel on the corner/edge probability map
  4. all the sampled values are reduced to a single scalar called score
  5. compute the gradient for the 6 cuboid parameter to maximize the score
  6. Iterative apply gradient ascent (step 2 through 6)

It take less than 2 seconds on CPU and found slightly better result than offical reported.

Quantitative Evaluation

See python eval.py -h for more detailed arguments explanation. To get the result from my trained network (link above):

python eval.py --path_prefix ckpt/epoch_30 --flip --rotate 0.333 0.666

To evaluate with gradient ascent post optimization:

python eval.py --path_prefix ckpt/epoch_30 --flip --rotate 0.333 0.666 --post_optimization

Dataset - PanoContext

exp 3D IoU(%) Corner error(%) Pixel error(%)
Official best 75.12 1.02 3.18
ours rgb only 71.42 1.30 3.83
ours rgb only
w/ gd opt
72.52 1.50 3.66
ours 75.11 1.04 3.16
ours
w/ gd opt
76.90 0.93 2.81

Dataset - Stanford 2D-3D

exp 3D IoU(%) Corner error(%) Pixel error(%)
Official best 77.51 0.92 2.42
ours rgb only 70.39 1.50 4.28
ours rgb only
w/ gd opt
71.90 1.35 4.25
ours 75.49 0.96 3.07
ours
w/ gd opt
78.90 0.88 2.78

References