zouchuhang / LayoutNet

Torch implementation of our CVPR 18 paper: "LayoutNet: Reconstructing the 3D Room Layout from a Single RGB Image"
http://openaccess.thecvf.com/content_cvpr_2018/papers/Zou_LayoutNet_Reconstructing_the_CVPR_2018_paper.pdf
MIT License
417 stars 93 forks source link

Would you mind share the code about perspective images? #6

Open Mingyangliiiiii opened 6 years ago

zouchuhang commented 6 years ago

@Mingyangliiiiii You could now check the training code in "driver_persp_joint_lsun_type.lua". The training procedure is similar to the panorama case, where we first train the edge prediction branch then train the whole structure together. Preprocessed data files (with our corrected gt labeling in LSUN) will be updated later. Pretrained models are released.

Mingyangliiiiii commented 6 years ago

@zouchuhang It’s very kind of you!Thanks a lot.

frozenca commented 6 years ago

@zouchuhang Could you please share .t7 formatted LSUN data files?

zouchuhang commented 6 years ago

@frozenca The .t7 file: https://drive.google.com/file/d/1GCK1NYJRE62DUVj2t5cu3CrCmzoV6urc/view?usp=sharing and all the preprocessed LSUN data: https://drive.google.com/file/d/1BSYquS7LietkRiyZMxBlqtY8uZSIsUUg/view?usp=sharing

You shall be able to run the training scripts once you download them all. Remember to merge the above downloads to "/LSUN_data/" folder and put it under the "/data/" folder

frozenca commented 6 years ago

@zouchuhang Thank you very much for your efforts! :+1:

frozenca commented 6 years ago

@zouchuhang I've constructed testing code for perspective images based on your panoramic image test code (testNet_pano_full.lua) with changing network settings similar to the perspective image testing model (model_persp_joint_lsun_type.lua)

However, the bug occurs: the size of the pretrained model and model container does not match.

My code:

local unpool5_c = nn.SpatialUpSamplingNearest(2)(unpool5_c) local deconv5_c = nn.SpatialConvolution(32*3,1,3,3,1,1,1,1)(unpool5_c) local deconv6_sf_c = nn.Sigmoid()(deconv5_c)

-- refinement local ref0 = nn.Reshape(204844)(pool7) local ref1 = nn.Linear(204844, 1024)(ref0) local ref1_relu = nn.ReLU(true)(ref1) local ref2 = nn.Linear(1024, 256)(ref1_relu) local ref2_relu = nn.ReLU(true)(ref2) local ref3 = nn.Linear(256, 64)(ref2_relu) local ref3_relu = nn.ReLU(true)(ref3) local ref4 = nn.Linear(64, 11)(ref3_relu)

model.core = nn.gModule({input_x},{deconv6_sf, deconv6_sf_c, ref4})

model.core:cuda() params = torch.load('/model/perspfull_lsun_type_pretrained.t7') print(params:size()) --- This gives 128187606

model_params, grad_params = model_utils.combine_all_parameters(model.core) print(model_params:size()) --- This gives 128182415

model_params = model_params:copy(params) --- This gives error due to mismatch in size

Could you please give me an advice that what's wrong in my code with constructing model.core ?

zouchuhang commented 6 years ago

@frozenca Your "deconv5_c" dose not match with the deconv5_c defined in the model ( check "model_persp_joint_lsun_type.lua" Line 148)

ngdmk commented 6 years ago

Hi @zouchuhang, Thanks for the wonderful work. To replicate the numbers and get the corners (Nx2 matrix) on the perspective images of LSUN dataset, I trained a model using "th driver_persp_joint_lsun_type.lua" and initialized with "perspfull_lsun_type_pretrained.t7". I trained for 8000 iterations and the lowest validation loss I could get is ~0.14. Is this reasonable? I find that the results are better using the "perspfull_lsun_type_pretrained.t7" initialization model.

Another question, the network produces, 3 outputs, a) 3x512x512 matrix where each channel indicates wall-wall, ceiling-wall and wall-floor boundary probabilities in order, b) 8x512x512 matrix, with one channel per corner. How do I get the coordinates of the corner pixel? It is argmax on each channel to get corresponding corner coordinates? And how do I map the corner points with the type of the room layout? c) 1x11 matrix indicating the probabilities for the "type" of room layout. Do you follow the LSUN dataset layout ordering?

Thanks a lot for the coming help.

zouchuhang commented 6 years ago

@ngdmk ~0.14 shall be reasonable. The training shall replicate our performance and you can compare with our provided pretrained full model for sanity check. We follow the LSUN dataset layout ordering, once you get the 8512512 corner matrix and the related room type, you can get the argmax on the related channels of the predicted type and connect them together.

ngdmk commented 6 years ago

Thanks a lot for the reply @zouchuhang . I still have few questions related to training the network described by "driver_persp_joint_lsun_type.lua". I am trying to scale the gradients from background (non-corner regions) by 0.2 as you have mentioned in the paper and similar to RoomNet. Is this included in this repo or do I need to make the change myself to BCECriterion.lua script?

Also the default parameters in driver_persp_joint_lsun_type.lua are lr=1e-4, batchsize=1, numpasses=1. Are these settings correct to replicate your results ?

Would it be possible to share the final trained model for room corner and type estimation on perspective images ? Thanks much.

zouchuhang commented 6 years ago

@ngdmk For the gradient re-weighting you can refer to L157-165 in "train_persp_joint_lsun_type.lua". The re-weighting is included in all the training scripts.

Yes the setting shall replicate the results.

The model "perspfull_lsun_type_pretrained.t7" in the shared model folder "https://drive.google.com/file/d/1bg9ZP3_KA1kvTWpCh4wQ0PfAuCm4j0qa/view" is our final trained model.