kwea123 / CasMVSNet_pl

Cascade Cost Volume for High-Resolution Multi-View Stereo and Stereo Matching using pytorch-lightning
GNU General Public License v3.0
278 stars 30 forks source link

evaluation metric #6

Closed cvgogogo closed 4 years ago

cvgogogo commented 4 years ago

Hi, kwea123, thanks for sharing your implementation. I find in this paper, they use acc and completeness. I understand why they use both. I check the original paper "Large-Scale Data for Multiple-View Stereopsis" and their descriptions are:

– Accuracy is measured as the distance from the MVS reconstruction to the structured light reference, encapsulating the quality of the reconstructed MVS points. – Completeness is measured as the distance from the reference to theMVSreconstruction, encapsulating how much of the surface is captured by the MVS reconstruction. .... These distances are measured by comparing structured light and MVS-reconstructed 3D point clouds. More specifically, we measure the distance from every point in one point cloud to the closest point in the other point cloud and then we record statistics about the distribution of these. We chose to characterize these empirical probability distribution functions (PDFs) by their mean and median, after removing observations with distances above 20 mm. The latter was done so that a few large outliers would not dominate the result.

so I don't understand why match every point in one point cloud to the closest point in the other point . Do we need to align these two point clouds first ? Even they are aligned, how can it ensure the matching points are correct corresponding points?

thanks again. cheers!

kwea123 commented 4 years ago

The point clouds are all in a fixed world coordinate, so they are aligned from the beginning. The matching points are not corresponding points, since the gt and the prediction could have different number of points. I think this metric is good in the sense that it is less sensitive to the prediction point cloud density.

cvgogogo commented 4 years ago

The point clouds are all in a fixed world coordinate, so they are aligned from the beginning. The matching points are not corresponding points, since the gt and the prediction could have different number of points. I think this metric is good in the sense that it is less sensitive to the prediction point cloud density.

Thanks for you quick reply. Do you think it will be better to use the projection error for both accuracy and completeness ? Here projection error means to project both 3D point cloud to 2D image with known intrinsic and extrinsic parameters.

kwea123 commented 4 years ago

I'm not sure I understand. Do you mean the depth error pixel-wise on the image? The "good" metrics depend on how you intend to use the prediction: here the task is to reconstruct the whole 3D world, so I think point cloud evaluation is the correct way. If you just want a depth estimation from a certain view, you can use 2d evaluation.

cvgogogo commented 4 years ago

I'm not sure I understand. Do you mean the depth error pixel-wise on the image? The "good" metrics depend on how you intend to use the prediction: here the task is to reconstruct the whole 3D world, so I think point cloud evaluation is the correct way. If you just want a depth estimation from a certain view, you can use 2d evaluation.

Sorry for ambiguous description. Thank you very much for your kindly explanations.