yzre / SIHE

Estimation of building heights with single street view images
16 stars 3 forks source link

Heights are shorter then expected #2

Open vairaj790 opened 1 month ago

vairaj790 commented 1 month ago

Hi,

I tried out the algorithm to test on a few streetview images and am getting shorter heights than expected. Learning from the earlier issues on Github, I used the customized 'vps_models/config.yaml' and 'vps_models/checkpoint_latest.pth.tar' to create the vanishing points. Although I am not sure what is the problem, I want to take you to the steps that I followed to gets the results.

LCNN

I created the wireframe image and .npz files. The input file size was 640x640, but the model gave outputs of 480x480 (images with wireframe).

MMseg

Again for this two files were created, first the segmented image and second being the npz file.

nuervps

I created camera files (.json) manually for each image which were later converted into npz files through 'dataset/su3.py' in neurvps. I set the pitch as 0 degrees in the camera.json files. After this, I used the uploaded config.yaml and checkpoints from SIHE/misc. But there was a catch here, the model did not take 640x640 size images. I had to crop images to 512x512 to get the outputs. The output files from the neurvps model were then transformed from 3D to 2D through 'SIHE/misc/vpt_transfrom.py'.

SIHE

Input for different folders: imgs - 640x640 image 0004

lines - 480x480 image and npz file 0004nlines

segs - 640x640 image and npz file 0004segre

vpts - npz file (output from vpt_transform.py)

Result image : 0004_htre

I have attached the sample input files and the result for your reference. I would appreciate if you could help me in identifying the mistakes if any.

ylxbyy commented 1 month ago

Hi @vairaj790,

I think the problem primarily lies in the process of generating vanishing points.

Firstly, the orientations of the line segments in the resulting image are incorrect. Since the pitch is zero, the vertical vanishing point should ideally be at infinity. However, the line segments seem to converge towards a nearby point, not at an infinitely distant location.

Secondly, using the neurvps model, cropping the image to 512x512 may not be the suitable approach. To meet the model's image size requirements, the image should be resized to 512x512 before being fed into the model. After processing, the output vpts can then be adjusted to align with the original image dimensions of 640x640. Besides, the camera json files are not necessary in the prediction. Did you mistake something in that process?

Additionally, the output image size of 480x480 from the LCNN does not affect the analysis, as the line coordinates in the npz file are utilized and remain consistent with the dimensions of the original image.