zhyever / PatchFusion

[CVPR 2024] An End-to-End Tile-Based Framework for High-Resolution Monocular Metric Depth Estimation
https://zhyever.github.io/patchfusion/
MIT License
958 stars 64 forks source link

Enhancement: Optimisation of Depth Estimation for Varied Lighting Conditions #7

Closed yihong1120 closed 6 months ago

yihong1120 commented 9 months ago

Dear Contributors,

I hope this message finds you well. I am reaching out to discuss a potential enhancement to the PatchFusion framework, specifically regarding its performance under varied lighting conditions. Having experimented with the pre-trained model and the provided demos, I have observed that the depth estimation accuracy can be influenced by the lighting of the input images.

Issue Description: It appears that the current model may not consistently account for the nuances of shadows and highlights, which can be particularly challenging in high-contrast environments. This can lead to depth artefacts and inaccuracies, especially in scenarios where the light source is either too intense or insufficient.

Proposed Enhancement: I propose the investigation and integration of lighting-invariant features within the PatchFusion pipeline. This could potentially involve the use of normalisation techniques or the incorporation of a lighting-aware neural network module that can adapt the depth estimation process to the lighting conditions of the input image.

Potential Benefits:

Additional Context: I have attached a few sample images along with their corresponding depth maps generated by the current model to illustrate the aforementioned issue. The samples include images taken under direct sunlight, low light, and artificial lighting conditions.

I believe that addressing this aspect could significantly bolster the versatility and applicability of PatchFusion. I would be keen to contribute to this endeavour and collaborate on potential solutions.

Thank you for considering this enhancement. I look forward to your thoughts and any further discussion on the matter.

Best regards, yihong1120

zhyever commented 9 months ago

Thanks for your comments, and they are very practical for the future work. Since the patchfusion is only trained on a narrow synthesis dataset, it's a potential issue for patchfusion to adopt on real data. While it's based on ZoeDepth, benefits from the super zero-short ability of Zoe, and does good jobs in some real data, it's still an open issue anyway.

Possible solution could be: looking for some ways to use stable diffusion model which has been trained on super huge datasets, so that the domain gap could not be an issue; domain generalization; domain adapataion; some specific designs for input maybe also effective.