kijai / ComfyUI-Marigold

Marigold depth estimation in ComfyUI
GNU General Public License v3.0
395 stars 16 forks source link

Messages from Marigold authors #6

Open toshas opened 6 months ago

toshas commented 6 months ago

Thanks for creating the node, very happy that the community is interested in Marigold!

According to some users (https://www.reddit.com/r/comfyui/s/1QO2IAdUHl), the quality of the produced result is not always consistent with our online demo. We would like to recommend making the default settings of the node consistent with the default settings of our repository.

Additionally, we are working to update the source code of our pipeline to include several performance fixes, so stay tuned for updates!

Anton

kijai commented 6 months ago

Hey, thanks for the amazing depth estimator!

I believe the issue/confusion in this regard is that the node is not downscaling the image while the online demo is? Since we are working with nodes I find it more intuitive to downscale with separate node than force with this one, like I'm showing in the example, though I'm still learning how to best utilize this myself.

While working on this I used the in-the-wild_example/example_0.jpg image to get parity with the run.py -script, I was also confused about the results until I downscaled the image first, then it matched.

toshas commented 6 months ago

Appreciate your insights on image resolution handling. Our pipeline, optimized around 768x768 due to its Stable Diffusion roots, delivers optimal depth maps within this range. While it adjusts output resolution to match the input, processing high-res images like 4K directly can compromise depth map quality.

To ensure a seamless user experience, especially for beginners, we advocate for a "black box" approach. Keeping the --resize_to_max_res option by default allows users to obtain high-quality results without tweaking resolution settings. This setup also future-proofs for upcoming enhancements in high-resolution processing, eliminating the need for users to modify their graph structure later.

Thus, we recommend maintaining this default setting to balance quality and user-friendliness, ensuring consistently good results across various resolutions.

ricardofeynman commented 6 months ago

"upcoming enhancements" nice, haven't even tried it yet and it already looks amazing. I'll be investigating further tomorrow, but could Marigold potentially run faster than the small MiDaS model for realtime use at 1576x768?

Fannovel16 commented 6 months ago

@ricardofeynman Marigold is way slower than MiDaS and only suitable for high-resolution still images (or videos if someone can figure out how to reduce it's flickering). But you can try to compile MiDaS to Tensorrt and I'm pretty sure 60fps is possible with that

ricardofeynman commented 6 months ago

May I ask which version of MiDaS you're referring to? I've been using the small model (2.1) which does 384px natively but in onnx format it handles 1576x768 at around 60fps, without any other effects or post-processing, although it has a lot of flickering between frames (maybe we're talking about a different kind of 'flickering' though).

Are there any depth models that are handling temporal coherence especially well for video that you know of? Realtime or otherwise.

Fannovel16 commented 6 months ago

@ricardofeynman

although it has a lot of flickering between frames

I'm talking about that. Depth map video generated by Marigold has noticeably more flickering than other still-image depth map estimators like LeRes or MiDaS unless you increase ensemble_size/n_repeat. But that will also increase processing time as more samples are generated for a single frames.

Are there any depth models that are handling temporal coherence especially well for video that you know of?

There are some models dedicating to video depth estimation. I haven't tried any of them tho but you can find some on Paperwithcode

ricardofeynman commented 6 months ago

Thanks for the reminder, Papers with Code slips my mind regularly as a place to check for the SOTA stuff. Will go play catch up on the recent video depth models.

I'm usually messing with the depth params of the generated mesh using audio or midi inputs, so the mesh flickering isn't always a disadvantage during live play anyway as it adds a little dynamo, but for scenes where having some really smooth pre-rendered depth estimation frames to combine with video or image sequences via a frame server, I'm yet to find an option that gives satisfactory results.

If anyone else chances upon this and knows of any video depth models that have been implemented across any SD tool's extensions which can give a fluid rendered depth output, all suggestions are welcome.

toshas commented 3 months ago

Hi there, thanks for integrating Marigold-LCM so quickly! I checked the code and noticed that the defaults for LCM seem to be the same as for the main Marigold model (10 ensemble members and 10 denoising steps). We identified that LCM denoising steps are not as important for LCM as the ensemble size. With that, I suggest adjusting LCM defaults to 1 denoising step and 4 ensemble members.

kijai commented 3 months ago

Hi there, thanks for integrating Marigold-LCM so quickly! I checked the code and noticed that the defaults for LCM seem to be the same as for the main Marigold model (10 ensemble members and 10 denoising steps). We identified that LCM denoising steps are not as important for LCM as the ensemble size. With that, I suggest adjusting LCM defaults to 1 denoising step and 4 ensemble members.

Hey, thanks for the LCM implementation, it seems very good!

My changes were very rushed and definitely could be better, I'm unsure how to approach this though as there isn't an easy way to achieve different defaults in Comfy without making it into a separate node. I think I'll simply make example workflows with proper values that people can choose from instead.

toshas commented 3 months ago

Can the model selector field have a hook, which resets other fields to different defaults depending on the model selection? If not, workflows with examples sound good, too! Thanks for your commitment to maintaining the node!

kijai commented 3 months ago

Can the model selector field have a hook, which resets other fields to different defaults depending on the model selection? If not, workflows with examples sound good, too! Thanks for your commitment to maintaining the node!

It's only possible with javascript sadly, and I'm not very good at that. I've added example workflows for now.

ponyminnie commented 3 months ago

"AttributeError: module diffusers has no attribute MarigoldPipeline" I can't use it because of the error above. Please help. :(

toshas commented 1 month ago

Hi there, I'm a bit late in here with the news that as of diffusers==0.28.0, Marigold pipelines have been promoted to the core. This tutorial lists many pipeline use cases, focusing on speed and memory efficiency. For example, the fastest configuration of both depth and normals checkpoints runs at 85ms per image (resolution=768) in LCM-fp16-TAESD on RTX3090.

There are also ways to use predictive uncertainty, pair it properly with ControlNet instead of MiDaS, and use for consistent video depth (and normals of course) estimation.

kijai commented 1 month ago

Hi there, I'm a bit late in here with the news that as of diffusers==0.28.0, Marigold pipelines have been promoted to the core. This tutorial lists many pipeline use cases, focusing on speed and memory efficiency. For example, the fastest configuration of both depth and normals checkpoints runs at 85ms per image (resolution=768) in LCM-fp16-TAESD on RTX3090.

There are also ways to use predictive uncertainty, pair it properly with ControlNet instead of MiDaS, and use for consistent video depth (and normals of course) estimation.

Hey!

Yeah I noticed that update and I have already implemented nodes to use the diffusers pipeline, including a video node that uses taesd and adjustable latent blending, as well as ability to use the normal map model.