ViewFusion: Towards Multi-View Consistency via Interpolated Denoising

clarencechen commented 3 months ago

Model/Pipeline/Scheduler description

Novel-view synthesis through diffusion models has demonstrated remarkable potential for generating diverse and high-quality images. Yet, the independent process of image generation in these prevailing methods leads to challenges in maintaining multiple view consistency. To address this, the authors introduce ViewFusion, a novel, training-free algorithm that can be seamlessly integrated into existing pre-trained diffusion models for conditional 3D view generation. The authors adopt an auto-regressive method that implicitly leverages previously generated views as context for next view generation, ensuring robust multi-view consistency during the novel view generation process. Extensive experimental results demonstrate the effectiveness of ViewFusion in generating consistent and detailed novel views.

The authors use the pre-trained Zero123 diffusion model to demonstrate their method for conditional 3D view generation, which only requires updated inference pipeline code. I think this method can be implemented either as an update to the existing Zero123 community pipeline, or as a new community pipeline that uses the same weights as Zero123.

Open source status

[X] The model implementation is available.
[X] The model weights are available (Only relevant if addition is not a scheduler).

Provide useful links for the implementation

Project Page: https://wi-sc.github.io/ViewFusion.github.io/ Github: https://github.com/Wi-sc/ViewFusion Authors: @wi-sc

rootonchair commented 3 months ago

Hi, can I work on this?

yiyixuxu commented 3 months ago

@rootonchair sure - let's add it as a community pipeline

github-actions[bot] commented 2 months ago

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

huggingface / diffusers